TENTH. 


_ EDITION Elementary 
Linear Algebra 


APPLICATIONS VERSION 


f 
J 
j 
f 
Fry, —=@ pal) F, oo 2 — 7 ay | : 
A f f ; j jf : 
. 7 7 7 : : 
, f . A 
= ——_¢- — —~ - 4 ¥ - , . - 
: 7 
4 /i 
4 
adj J ¢ 
/ Z 
Y 
J 7 
A ' 
(ieee od 


. 4 / P| 7 
a i i Ye / f 


HOWARD ANTON / CHRIS RORRES 


About The Author 


Howard Anton obtained his B.A. from Lehigh University, his M.A. from the University of Illinois, and his 
Ph.D. from the Polytechnic University of Brooklyn, all in mathematics. In the early 1960s he worked for 
Burroughs Corporation and Avco Corporation at Cape Canaveral, Florida, where he was involved with the 
manned space program. In 1968 he joined the Mathematics Department at Drexel University, where he taught 
full time until 1983. Since then he has devoted the majority of his time to textbook writing and activities for 
mathematical associations. Dr. Anton was president of the EPADEL Section of the Mathematical Association 
of America (MAA), served on the Board of Governors of that organization, and guided the creation of the 
Student Chapters of the MAA. In addition to various pedagogical articles, he has published numerous 
research papers in functional analysis, approximation theory, and topology. He is best known for his textbooks 
in mathematics, which are among the most widely used in the world. There are currently more than 150 
versions of his books, including translations into Spanish, Arabic, Portuguese, Italian, Indonesian, French, 
Japanese, Chinese, Hebrew, and German. For relaxation, Dr. Anton enjoys travel and photography. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


Preface 


This edition of Elementary Linear Algebra gives an introductory treatment of linear algebra that is suitable for 
a first undergraduate course. Its aim is to present the fundamentals of linear algebra in the clearest possible 
way—sound pedagogy is the main consideration. Although calculus is not a prerequisite, there is some 
optional material that is clearly marked for students with a calculus background. If desired, that material can 
be omitted without loss of continuity. 


Technology is not required to use this text, but for instructors who would like to use MATLAB, Mathematica, 
Maple, or calculators with linear algebra capabilities, we have posted some supporting material that can be 
accessed at either of the following Web sites: 


www.howardanton.com 


www.wiley.com/college/anton 


Summary of Changes in this Edition 


This edition is a major revision of its predecessor. In addition to including some new material, some of the old 
material has been streamlined to ensure that the major topics can all be covered in a standard course. These 
are the most significant changes: 


Vectors in 2-space, 3-space, and m-space Chapters 3 and 4 of the previous edition have been combined 
into a single chapter. This has enabled us to eliminate some duplicate exposition and to juxtapose concepts 
in n-space with those in 2-space and 3-space, thereby conveying more clearly how n-space ideas generalize 
those already familiar to the student. 


New Pedagogical Elements Each section now ends with a Concept Review and a Skills mastery that 
provide the student a convenient reference to the main ideas in that section. 


New Exercises Many new exercises have been added, including a set of True/False exercises at the end of 
most sections. 


Earlier Coverage of Eigenvalues and Eigenvectors The chapter on eigenvalues and eigenvectors, which 
was Chapter 7 in the previous edition, is Chapter 5 in this edition. 


Complex Vector Spaces The chapter entitled Complex Vector Spaces in the previous edition has been 
completely revised. The most important ideas are now covered in Section 5.3 and Section 7.5 in the context 
of matrix diagonalization. A brief review of complex numbers is included in the Appendix. 


Quadratic Forms This material has been extensively rewritten to focus more precisely on the most 
important ideas. 


New Chapter on Numerical Methods In the previous edition an assortment of topics appeared in the last 
chapter. That chapter has been replaced by a new chapter that focuses exclusively on numerical methods of 
linear algebra. We achieved this by moving those topics not concerned with numerical methods elsewhere 
in the text. 


Singular-Value Decomposition In recognition of its growing importance, a new section on Singular-Value 
Decomposition has been added to the chapter on numerical methods. 


Internet Search and the Power Method A new section on the Power Method and its application to 
Internet search engines has been added to the chapter on numerical methods. 


Applications There is an expanded version of this text by Howard Anton and Chris Rorres entitled 


Elementary Linear Algebra: Applications Version, io” (ISBN 9780470432051), whose purpose is to 
supplement this version with an extensive body of applications. However, to accommodate instructors who 
asked us to include some applications in this version of the text, we have done so. These are generally less 
detailed than those appearing in the Anton/Rorres text and can be omitted without loss of continuity. 


Hallmark Features 


Relationships Among Concepts One of our main pedagogical goals is to convey to the student that linear 
algebra is a cohesive subject and not simply a collection of isolated definitions and techniques. One way in 
which we do this is by using a crescendo of Equivalent Statements theorems that continually revisit 
relationships among systems of equations, matrices, determinants, vectors, linear transformations, and 
eigenvalues. To get a general sense of how we use this technique see Theorems 1.5.3, 1.6.4, 2.3.8, 4.8.10, 
4.10.4 and then Theorem 5.1.6, for example. 


Smooth Transition to Abstraction Because the transition from R” to general vector spaces is difficult for 
many students, considerable effort is devoted to explaining the purpose of abstraction and helping the 
student to “visualize” abstract ideas by drawing analogies to familiar geometric ideas. 


Mathematical Precision When reasonable, we try to be mathematically precise. In keeping with the level 
of student audience, proofs are presented in a patient style that is tailored for beginners. There is a brief 
section in the Appendix on how to read proof statements, and there are various exercises in which students 
are guided through the steps of a proof and asked for justification. 


Suitability for a Diverse Audience This text is designed to serve the needs of students in engineering, 
computer science, biology, physics, business, and economics as well as those majoring in mathematics. 


Historical Notes To give the students a sense of mathematical history and to convey that real people 
created the mathematical theorems and equations they are studying, we have included numerous Historical 
Notes that put the topic being studied in historical perspective. 


About the Exercises 


Graded Exercise Sets Each exercise set begins with routine drill problems and progresses to problems 
with more substance. 


True/False Exercises Most exercise sets end with a set of True/False exercises that are designed to check 
conceptual understanding and logical reasoning. To avoid pure guessing, the students are required to justify 
their responses in some way. 

Supplementary Exercise Sets Most chapters end with a set of supplementary exercises that tend to be 


more challenging and force the student to draw on ideas from the entire chapter rather than a specific 
section. 


Supplementary Materials for Students 


e Student Solutions Manual This supplement provides detailed solutions to most theoretical exercises and 
to at least one nonroutine exercise of every type (ISBN 9780470458228). 


¢ Technology Exercises and Data Files The technology exercises that appeared in the previous edition have 
been moved to the Web site that accompanies this text. Those exercises are designed to be solved using 
MATLAB, Mathematica, or Maple and are accompanied by data files in all three formats. The exercises and 
data can be downloaded from either of the following Web sites. 


www.howardanton.com 


www.wiley.com/college/anton 


Supplementary Materials for Instructors 


¢ Instructor's Solutions Manual This supplement provides worked-out solutions to most exercises in the 
text ISBN 9780470458235). 


WileyPLUS™ This is Wiley's proprietary online teaching and learning environment that integrates a 
digital version of this textbook with instructor and student resources to fit a variety of teaching and learning 
styles. WileyPLUS will help your students master concepts in a rich and structured environment that is 
available to them 24/7. It will also help you to personalize and manage your course more effectively with 
student assessments, assignments, grade tracking, and other useful tools. 


e Your students will receive timely access to resources that address their individual needs and will 
receive immediate feedback and remediation resources when needed. 


e There are also self-assessment tools that are linked to the relevant portions of the text that will enable 
your students to take control of their own learning and practice. 


e WileyPLUS will help you to identify those students who are falling behind and to intervene in a 
timely manner without waiting for scheduled office hours. 


More information about WileyPLUS can be obtained from your Wiley representative. 


A Guide for the Instructor 


Although linear algebra courses vary widely in content and philosophy, most courses fall into two categories 
—those with about 35—40 lectures and those with about 25-30 lectures. Accordingly, we have created long 
and short templates as possible starting points for constructing a course outline. Of course, these are just 
guides, and you will certainly want to customize them to fit your local interests and requirements. Neither of 
these sample templates includes applications. Those can be added, if desired, as time permits. 


Long Template Short Template 
Chapter |: Systems of Linear Equations and Matrices 7 lectures 6 lectures 


Chapter 2: Determinants 3 lectures 2 lectures 


Long Template Short Template 


Chapter 3: Euclidean Vector Spaces 4 lectures 3 lectures 
Chapter 4: General Vector Spaces 10 lectures 10 lectures 
Chapter 5: Eigenvalues and Eigenvectors 3 lectures 3 lectures 
Chapter 6: Inner Product Spaces 3 lectures 1 lecture 
Chapter 7: Diagonalization and Quadratic Forms 4 lectures 3 lectures 
Chapter 8: Linear Transformations 3 lectures 2 lectures 
Total: 37 lectures 30 lectures 


Acknowledgements 


I would like to express my appreciation to the following people whose helpful guidance has greatly improved 
the text. 


Reviewers and Contributors 


Don Allen, Texas A&M University 

John Alongi, Northwestern University 

John Beachy, Northern Illinois University 

Przemslaw Bogacki, Old Dominion University 

Robert Buchanan, Millersville University of Pennsylvania 
Ralph Byers, University of Kansas 

Evangelos A. Coutsias, University of New Mexico 

Joshua Du, Kennesaw State University 

Fatemeh Emdad, Michigan Technological University 
Vincent Ervin, Clemson University 

Anda Gadidov, Kennesaw State University 

Guillermo Goldsztein, Georgia Institute of Technology 
Tracy Hamilton, California State University, Sacramento 
Amanda Hattway, Wentworth Institute of Technology 
Heather Hulett, University of Wisconsin—La Crosse 

David Hyeon, Northern Illinois University 

Matt Insall, Missouri University of Science and Technology 
Mic Jackson, Earlham College 

Anton Kaul, California Polytechnic Institute, San Luis Obispo 


Harihar Khanal, Embry-Riddle University 

Hendrik Kuiper, Arizona State University 

Kouok Law, Georgia Perimeter College 

James McKinney, California State University, Pomona 
Eric Schmutz, Drexel University 

Qin Sheng, Baylor University 

Adam Sikora, State University of New York at Buffalo 
Allan Silberger, Cleveland State University 

Dana Williams, Dartmouth College 


Mathematical Advisors 

Special thanks are due to a number of talented teachers and mathematicians who provided pedagogical 
guidance, provided help with answers and exercises, or provided detailed checking or proofreading: 
John Alongi, Northwestern University 

Scott Annin, California State University, Fullerton 

Anton Kaul, California Polytechnic State University 

Sarah Streett 

Cindy Trimble, C Trimble and Associates 


Brad Davis, C Trimble and Associates 


The Wiley Support Team 


David Dietz, Senior Acquisitions Editor 

Jeff Benson, Assistant Editor 

Pamela Lashbrook, Senior Editorial Assistant 
Janet Foxman, Production Editor 

Maddy Lesure, Senior Designer 

Laurie Rosatone, Vice President and Publisher 
Sarah Davis, Senior Marketing Manager 
Diana Smith, Marketing Assistant 

Melissa Edwards, Media Editor 

Lisa Sabatini, Media Project Manager 
Sheena Goldstein, Photo Editor 

Carol Sawyer, Production Manager 


Lilian Brady, Copyeditor 


Special Contributions 


The talents and dedication of many individuals are required to produce a book such as this, and I am fortunate 
to have benefited from the expertise of the following people: 


David Dietz — my editor, for his attention to detail, his sound judgment, and his unwavering faith in me. 


Jeff Benson — my assistant editor, who did an unbelievable job in organizing and coordinating the many 
threads required to make this edition a reality. 


Carol Sawyer — of The Perfect Proof, who coordinated the myriad of details in the production process. It 
will be a pleasure to finally delete from my computer the hundreds of emails we exchanged in the course of 
working together on this book. 


Scott Annin — California State University at Fullerton, who critiqued the previous edition and provided 
valuable ideas on how to improve the text. I feel fortunate to have had the benefit of Prof. Annin's teaching 
expertise and insights. 


Dan Kirschenbaum — of The Art of Arlene and Dan Kirschenbaum, whose artistic and technical expertise 
resolved some difficult and critical illustration issues. 


Bill Tuohy — who read parts of the manuscript and whose critical eye for detail had an important influence 
on the evolution of the text. 


Pat Anton — who proofread manuscript, when needed, and shouldered the burden of household chores to 
free up time for me to work on this edition. 


Maddy Lesure — our text and cover designer whose unerring sense of elegant design is apparent in the 
pages of this book. 


Rena Lam — of Techsetters, Inc., who did an absolutely amazing job of wading through a nightmare of 
author edits, scribbles, and last-minute changes to produce a beautiful book. 


John Rogosich — of Techsetters, Inc., who skillfully programmed the design elements of the book and 
resolved numerous thorny typesetting issues. 


Lilian Brady — my copyeditor of many years, whose eye for typography and whose knowledge of language 
never ceases to amaze me. 


The Wiley Team — There are many other people at Wiley who worked behind the scenes and to whom I owe 
a debt of gratitude: Laurie Rosatone, Ann Berlin, Dorothy Sinclair, Janet Foxman, Sarah Davis, Harry Nolan, 
Sheena Goldstein, Melissa Edwards, and Norm Christiansen. Thanks to you all. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


| CHAPTER ie 


systems of Linear 
Equations and Matrices 


CHAPTER CONTENTS 


1.1. 
1.2. 
LS. 
1.4. 
LS: 
1.6. 
Ly. 
1.8. 


12. 


Introduction to Systems of Linear Equations 
Gaussian Elimination 
Matrices and Matrix Operations 
Inverses; Algebraic Properties of Matrices 
Elementary Matrices and a Method for Finding 4~! 
More on Linear Systems and Invertible Matrices 
Diagonal, Triangular, and Symmetric Matrices 
Applications of Linear Systems 

e Network Analysis (Traffic Flow) 

e Electrical Circuits 

e Balancing Chemical Equations 


¢ Polynomial Interpolation 


Leontief Input-Output Models 


INTRODUCTION 


Information in science, business, and mathematics is often organized into rows and 


columns to form rectangular arrays called “matrices” (plural of “matrix”). Matrices often 
appear as tables of numerical data that arise from physical observations, but they occur in 
various mathematical contexts as well. For example, we will see in this chapter that all of 


the information required to solve a system of equations such as 


5x+y =3 
2x—y=4 


is embodied in the matrix 


5 1 3 
2—1 4 


and that the solution of the system can be obtained by performing appropriate operations 
on this matrix. This is particularly important in developing computer programs for solving 
systems of equations because computers are well suited for manipulating arrays of 
numerical information. However, matrices are not simply a notational tool for solving 
systems of equations; they can be viewed as mathematical objects in their own right, and 
there is a rich and important theory associated with them that has a multitude of practical 
applications. It is the study of matrices and related topics that forms the mathematical field 
that we call “linear algebra.” In this chapter we will begin our study of matrices. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


1.1 Introduction to Systems of Linear Equations 


Systems of linear equations and their solutions constitute one of the major topics that we will study in this 
course. In this first section we will introduce some basic terminology and discuss a method for solving such 
systems. 


Linear Equations 


Recall that in two dimensions a line in a rectangular xy-coordinate system can be represented by an equation of 
the form 


ax-+-by=c (a, Snot both 0) 
and in three dimensions a plane in a rectangular xyz-coordinate system can be represented by an equation of the 
form 
ax-+by+ez=d (a, 4, cnotall0) 
These are examples of “linear equations,” the first being a linear equation in the variables x and y and the second 
a linear equation in the variables x, y, and z. More generally, we define a linear equation in the n variables 


X1,%2,---. Xy to be one that can be expressed in the form 
A,X, +a9xZ+...+ AyX,=d (1) 
where @1, @3,..., @, and b are constants, and the a's are not all zero. In the special cases where » = 2 or » = 3, 


we will often use variables without subscripts and write linear equations as 


ayx+azgy= (a1, a2 not both 0) (2) 


ayx+agy+awz= (ay, a3, aznotall 0) (3) 
In the special case where } — 0, Equation | has the form 
Q1X1 +A72X2 +... + AyxX%,=0 (4) 
which is called a homogeneous linear equation in the variables x1, 3, .... Xp. 
EXAMPLE 1 LinearEquations <4 
Observe that a linear equation does not involve any products or roots of variables. All variables 


occur only to the first power and do not appear, for example, as arguments of trigonometric, 
logarithmic, or exponential functions. The following are linear equations: 


x+3y=7 xX, — 2x3 —3x34+%x4=0 
ary t3z=—1 xy 4x24+...4+%,=1 


The following are not linear equations: 


x4 3y?7 =4 3x + 2y—xy=5 
sinx + y=0 yx 2x2+x%3=1 


A finite set of linear equations is called a system of linear equations or, more briefly, a linear system. The 
variables are called unknowns. For example, system 5 that follows has unknowns x and y, and system 6 has 
unknowns % 1,*32, and *3. 


Sx-+y=3 4x, —x94+3x3= =—1 (5) 
2x—y=4 3x, +2x%2+ 9x3= —4 (6) 


The double subscripting on the coefficients 43; 
of the unknowns gives their location in the 
system—the first subscript indicates the equation 
in which the coefficient occurs, and the second 
indicates which unknown it multplies. Thus, @13 
is in the first equation and multiplies x3. 


A general linear system of m equations in the n unknowns xj, x3, ..., X, can be written as 


211X1 +212X%2 +... + A1yX%, = 41 
421%] +.422%2 +... + 4axXp = 42 (7) 
Am 1X1 + AymIXZ +... + AyyXy = Oy 


A solution of a linear system in n unknowns x1, X3, ..., X, 18 a sequence of n numbers 51, $3, ..., X, for which 
the substitution 


41 =S1, 42 =83,---. An =Sy 
makes each equation a true statement. For example, the system in 5 has the solution 
=1l, y= -2 
and the system in 6 has the solution 
xy,=1, x2=2, x3=-—1 
These solutions can be written more succinctly as 
(1, —2) and (1,2, =—1) 
in which the names of the variables are omitted. This notation allows us to interpret these solutions geometrically 
as points in two-dimensional and three-dimensional space. More generally, a solution 
X1 =S1, X2=5),... Xn=Sy 
of a linear system in 7 unknowns can be written as 
($1, $2, --. Sy) 


which is called an ordered n-tuple. With this notation it is understood that all variables appear in the same order 


in each equation. If » — 2, then the n-tuple is called an ordered pair, and if , — 3, then it is called an ordered 
triple. 


Linear Systems with Two and Three Unknowns 


Linear systems in two unknowns arise in connection with intersections of lines. For example, consider the linear 
system 


ayx+byy=c, 
ax + boy =c2 
in which the graphs of the equations are lines in the xy-plane. Each solution (x, y) of this system corresponds to a 
point of intersection of the lines, so there are three possibilities (Figure 1.1.1): 
1. The lines may be parallel and distinct, in which case there is no intersection and consequently no solution. 
2. The lines may intersect at only one point, in which case the system has exactly one solution. 


3. The lines may coincide, in which case there are infinitely many points of intersection (the points on the 
common line) and consequently infinitely many solutions. 


y y 


No solution One solution Infinitely many 
solutions 
(coincident lines) 


Figure 1.1.1 


In general, we say that a linear system is consistent if it has at least one solution and inconsistent if it has no 
solutions. Thus, a consistent linear system of two equations in two unknowns has either one solution or infinitely 
many solutions—there are no other possibilities. The same is true for a linear system of three equations in three 
unknowns 

ayx+byy+eyz=—a, 

aqx + hoy +c77 =} 

a3x + day +327 = a3 
in which the graphs of the equations are planes. The solutions of the system, if any, correspond to points where 


all three planes intersect, so again we see that there are only three possibilities—no solutions, one solution, or 
infinitely many solutions (Figure 1.1.2). 


YQYUHAEG 


No solutions No solutions No solutions | | No solutions 
(three parallel planes; (two parallel planes; |(no common intersection) | | (two coincident planes 
no common intersection) no common intersection) | © ° parallel to the third; 


no common intersection) 


One solution Infinitely many solutions Infinitely many solutions Infinitely many solutions 
(intersection is a point) (intersection is a line) (planes are all coincident; (two coincident planes; 
intersection is a plane} intersection is a line) 


Figure 1.1.2 


We will prove later that our observations about the number of solutions of linear systems of two equations in two 
unknowns and linear systems of three equations in three unknowns actually hold for a// linear systems. That is: 


Every system of linear equations has zero, one, or infinitely many solutions. There are no other 
possibilities. 


EXAMPLE 2 A Linear System with One Solution <4 


Solve the linear system 
x=y=1 
2x +y=6 


Solution We can eliminate x from the second equation by adding —2 times the first equation to 
the second. This yields the simplified system 
x=—y=1 
3y=4 
4 


From the second equation we obtain y = 3 and on substituting this value in the first equation we 


obtainx =1+y= a Thus, the system has the unique solution 


3 


_7  ,_4 
=e 


Geometrically, this means that the lines represented by the equations in the system intersect at the 


single point @ 3) We leave it for you to check this by graphing the lines. 


EXAMPLE 3 ALinear System with No Solutions 


Solve the linear system 
x+y=4 
3x + 3y =6 


Solution We can eliminate x from the second equation by adding —3 times the first equation to 
the second equation. This yields the simplified system 

x+y=4 

O= —6 

The second equation is contradictory, so the given system has no solution. Geometrically, this 
means that the lines corresponding to the equations in the original system are parallel and distinct. 
We leave it for you to check this by graphing the lines or by showing that they have the same slope 
but different y-intercepts. 


EXAMPLE 4 A Linear System with Infinitely Many Solutions 


Solve the linear system 
4x =—2y=1 
16x = 8y =4 


In Example 4 we could have also obtained 
parametric equations for the solutions by 
solving 8 for y in terms of x, and letting 

x = ¢ be the parameter. The resulting 
parametric equations would look different 
but would define the same solution set. 


Solution We can eliminate x from the second equation by adding —4 times the first equation to 
the second. This yields the simplified system 


4x =—2y=1 
0=0 
The second equation does not impose any restrictions on x and y and hence can be omitted. Thus, 
the solutions of the system are those values of x and y that satisfy the single equation 


4x—2y =1 (8) 


Geometrically, this means the lines corresponding to the two equations in the original system 
coincide. One way to describe the solution set is to solve this equation for x in terms of y to obtain 


tt gd 


x= 4 | > and then assign an arbitrary value ¢ (called a parameter) to y. This allows us to 
express the solution by the pair of equations (called parametric equations) 
| ero | 
== 4 =f =f 
x 4 | 5 ¥ 


We can obtain specific numerical solutions from these equations by substituting numerical values 


te 0}, £= 1 yields the solution i 1}, 
and ¢ — — | yields the solution (2: = 1} You can confirm that these are solutions by 


4 


substituting the coordinates into the given equations. 


for the parameter. For example, ¢ = 0) yields the solution 


EXAMPLE 5 ALinear System with Infinitely Many Solutions 


Solve the linear system 


x=—=ypreoz = 5 
2x—2y+4z = 10 
3x—3y+6z = 15 


Solution This system can be solved by inspection, since the second and third equations are 
multiples of the first. Geometrically, this means that the three planes coincide and that those values 
of x, y, and z that satisfy the equation 


x—y+2z=5 (9) 


automatically satisfy all three equations. Thus, it suffices to find the solutions of 9. We can do this 

by first solving 9 for x in terms of y and z, then assigning arbitrary values r and s (parameters) to 

these two variables, and then expressing the solution by the three parametric equations 
x=5+r—2s, y=r, Z=s 


Specific solutions can be obtained by choosing numerical values for the parameters r and s. For 
example, taking y = | ands = Q yields the solution (6, 1,0). 


Augmented Matrices and Elementary Row Operations 


As the number of equations and unknowns in a linear system increases, so does the complexity of the algebra 
involved in finding solutions. The required computations can be made more manageable by simplifying notation 
and standardizing procedures. For example, by mentally keeping track of the location of the +'s, the x's, and the 
='s in the linear system 


a14X, + @49x%2 4 1 as f @[yX, = Of 
a31%1 - a@99x%2 4 ose - @=yX, = 49 


am1X%1 - @y2x%2 4 —— - AynXn = dy 


we can abbreviate the system by writing only the rectangular array of numbers 


411 @j2 "** @t, 2 
| ag." * Gig 23 
Q@m1 @m2 °* * mm by 


As noted in the introduction to this chapter, the 
term “matrix” is used in mathematics to denote a 
rectangular array of numbers. In a later section 
we will study matrices in detail, but for now we 
will only be concerned with augmented matrices 
for linear systems. 


This is called the augmented matrix for the system. For example, the augmented matrix for the system of 
equations 


x1 +x2+2x3=9 11 29 
2x, +4x2—3x3=1 1s 12 4 =3 1 
3x4 -- 6x3 = 5x3=0 3 6 =—5 0 


The basic method for solving a linear system is to perform appropriate algebraic operations on the system that do 
not alter the solution set and that produce a succession of increasingly simpler systems, until a point is reached 
where it can be ascertained whether the system is consistent, and if so, what its solutions are. Typically, the 
algebraic operations are as follows: 


1. Multiply an equation through by a nonzero constant. 
2. Interchange two equations. 
3. Add a constant times one equation to another. 


Since the rows (horizontal lines) of an augmented matrix correspond to the equations in the associated system, 
these three operations correspond to the following operations on the rows of the augmented matrix: 


1. Multiply a row through by a nonzero constant. 
2. Interchange two rows. 
3. Add a constant times one row to another. 


These are called elementary row operations on a matrix. 


In the following example we will illustrate how to use elementary row operations and an augmented matrix to 
solve a linear system in three unknowns. Since a systematic procedure for solving linear systems will be 
developed in the next section, do not worry about how the steps in the example were chosen. Your objective here 
should be simply to understand the computations. 


EXAMPLE 6 Using Elementary Row Operations << 


In the left column we solve a system of linear equations by operating on the equations in the 
system, and in the right column we solve the same system by operating on the rows of the 
augmented matrix. 


x+y+az = 9 
ax-+4y—3z = | 
3x+6y—5z = 0 


Add —2 times the first equation to the second 
to obtain 


x+y+2z = 9 
2y—7z = <1? 
3x+6y—=—5z = 0 


Add —3 times the first equation to the third to 
obtain 


x+y+2z = 9 
2y = 7z = <1? 
3y—1lz = =27 
Multiply the second equation by > to obtain 
x+y+2z = 9 
3y—1lz = =27 


Add —3 times the second equation to the third 
to obtain 


x+y+2z = 9 

Ie ee 

ae 2 
Multiply the third equation by —2 to obtain 

x+y+2z = 9 

z = 3 


Add —1 times the second equation to the first 
to obtain 


1 1 
24-3 1 
3 6 


Add —2 times the first row to the second 
to obtain 


1. 2 9 
02 =—7 =1? 
3 6 =—5 0 


Add —3 times the first row to the third to 
obtain 


11 Z 9 
02 —7 =1? 
03 =—11 =—2? 
Multiply the second row by 5 to obtain 
11 2 9 
i eee a 
oO1- 5 5 
03 =—11 =—2? 


Add —3 times the second row to the third 
to obtain 


11 2 9 
7 _1 
0 1 -> 5 
1 _3 
_m 2 2 


Multiply the third row by —2 to obtain 
; a | 2 9 


7 _17 
0 1 —> 5 
0 0 1 3 


Add —1 times the second row to the first 
to obtain 


dle -e dt. - 32. 
{re = 5 10 5 5 
we eg ee et a ee 

Jy Zz 5 0 1 5 5 
Zz = 3 0 0 1 3 


Add 7 times the third equation to the first Add 2 times the third row to the first 


and é times the third equation to the second to and . times the third row to the second 


obtain to obtain 
x = | 100 1 
y = 2? 010 2 
zg = 3 001 3 


The solution x = 1, y= 2, z= 3 is now evident. 


s 
Maxime Bocher (1867-1918) 


Historical Note The first known use of augmented matrices appeared between 200 B.C. 
and 100 B.c. in a Chinese manuscript entitled Nine Chapters of Mathematical Art. The 
coefficients were arranged in columns rather than in rows, as today, but remarkably the 
system was solved by performing a succession of operations on the columns. The actual 
use of the term augmented matrix appears to have been introduced by the American 
mathematician Maxime Bocher in his book Introduction to Higher Algebra, published in 
1907. In addition to being an outstanding research mathematician and an expert in Latin, 
chemistry, philosophy, zoology, geography, meteorology, art, and music, B6cher was an 
outstanding expositor of mathematics whose elementary textbooks were greatly 
appreciated by students and are still in demand today. 

[Image: Courtesy of the American Mathematical Society] 


Concept Review 


Linear equation 


Homogeneous linear equation 
e System of linear equations 

¢ Solution of a linear system 

e Ordered n-tuple 

° Consistent linear system 

° Inconsistent linear system 

e Parameter 


e Parametric equations 


Augmented matrix 


e Elemenetary row operations 


Skills 

e Determine whether a given equation is linear. 

e Determine whether a given n-tuple is a solution of a linear system. 

e Find the augmented matrix of a linear system. 

e Find the linear system corresponding to a given augmented matrix. 

e Perform elementary row operations on a linear system and on its corresponding augmented matrix. 
e Determine whether a linear system is consistent or inconsistent. 


e Find the set of solutions to a consistent linear system. 


Exercise Set 1.1 


1. In each part, determine whether the equation is linear in Xj, *2, and X3. 
(a) x1 +4 5x2 — 2x3=1 
(b) %1 + 3xQ+21x%3=2 
(c) %1= — 7x2 + 3x3 
(@) xp? +22 + 8x3=5 
(e) id — 2x3+%3=4 
1 Vs 


(f) wx, 2x2 | 373= 


Answer: 


(a), (c), and (f) are linear equations; (b), (d) and (e) are not linear equations 


2. In each part, determine whether the equations form a linear system. 


(a) ~2x+4y+z=2 
3x2 = 
¥ 


by a4 
2x =8 
(c) 4x—y+2z= =—1 
=—x+ (In2)y—3z= 0 
(d) 3z+-x= =—4 


y+iz= 1 
6x +2z= 3 
—x—yz= 4 
3. In each part, determine whether the equations form a linear system. 
(a) 2x4 =—- x4= 35 
=x, + 5x2 + 3x3 —2x%4= = 1 


(b) sin(2x, +23) = 5 


p2kg—2xq 1 


x2 

4x4=4 
(c) @xy— xg+ 2x3 = 0 
2x, + %2—x3x%4 = 3 
=—x,+5x2— x*x4= 1 


(d) Xp RxAQHABH XY 


Answer: 


(a) and (d) are linear systems; (b) and (c) are not linear systems 
4. For each system in Exercise 2 that is linear, determine whether it is consistent. 


5. For each system in Exercise 3 that is linear, determine whether it is consistent. 
Answer: 


(a) and (d) are both consistent 

6. Write a system of linear equations consisting of three equations in three unknowns with 
(a) no solutions. 
(b) exactly one solution. 


(c) infinitely many solutions. 


7. In each part, determine whether the given vector is a solution of the linear system 
2x, —4x9—x3= 1 
x1 —3x2+%x%3=1 
3x1, — 5x3 —3x3= 1 


(a) (3, 1, 1) 
(b) G,-1, D 


(c) (13,5, 2) 


d) {13 3 
(a) ¢ 5° 2 
(e) (17, 7, 5) 
Answer: 


(a), (d), and (e) are solutions; (b) and (c) are not solutions 
8. In each part, determine whether the given vector is a solution of the linear system 
X1 + 2x3 — 2x3 =3 
3x, —x2+%3=1 
=x, + 5x2z—5x3=5 


(a) (>. £ 1) 
(b) a ey 0] 
(c) (5, 8, 1) 
068.3) 
06.8 


9. In each part, find the solution set of the linear equation by using parameters as necessary. 
(a) 7x =—S5y=3 
(b) 8x1 + 2x2 — 5x3 + 6x4=1 


Answer: 


(a) =: fee 
x atta 


yout 
(b) x, = 1,235 rFoi-s 
27> 7 
%3°>= 8S 
x4 = £ 
10. In each part, find the solution set of the linear equation by using parameters as necessary. 
(a) 3x1 —5%2+4%3=7 
(b) 3v—8w + 2x = y +4z=0 
11. In each part, find a system of linear equations corresponding to the given augmented matrix 
(a) |2 O 0 
3 =—4 0 
se Ge | 


=2 
eal 21-3 i 
124 01 
(d) {1 00 0 
0100 —2 
0010 3 
0001 4 
Answer: 
(a) 2x4 = 0 
3x, — 4x, = O 
x2 = 1 
(b) 3x4 = 2x3 = 35 
Tx, + x2 + 4x3 = =3 
=—2%2 + 4X3 7 
(c) #xy + 2x9 + x3 = 3xq = 5 
X1 + 2x9 + 4x3 = 
(d) *1 = 7 
x2 = -2 
X3 =. 3 
xy = 4 


12. In each part, find a system of linear equations corresponding to the given augmented matrix. 


(a) 2 —1 
—4 -6 

1 =1 

3. 0 


Gali 2 33. -4 


Al de pel 
5 mae Ys i 

=8 0: 0 3 

df 301-4 3 
ee ee ee 

oe ie en ae 

i ree ee 


13. In each part, find the augmented matrix for the given system of linear equations. 


(a) — 2x4 = 6 


ae =. 30 
9x, = =—3 
(b) 6x1 —x2+ 3x3=4 
5x2 =—x3=1 
(c) 2x3 —3x4+ x5 = O 


—3x,— x2+ %3 all 
6x1 + 2x2 —x3+2x4—3x5 = 6 
(d) *%1-%5=7 


Answer: 


(a)|-2 6 
3 8 
9 =3 
6 -1 
os 
0 2 


34 

0 | 
0-3 1 0 
—3-1 1 0 0 =1 
6 2-1 2-3 6 


q@ [1000 -1 7] 


14. In each part, find the augmented matrix for the given system of linear equations. 
(a) 3x, —2x2= —1 
4x, +5x2 = 3 
7x1; +3x2 = 2 
(b) 2x4 + 2x3 = 1 
3x, —x2+4x3=7 
6x, +%x%2— x3=0 


(c) X1+ 2x2 — x4+%5=1 

3x2+ x3 —x5=2 

x3+ 7x4 =| 
(d) *1 =1 
= 2 
x3=3 


15. The curve y = ax* + hx 4++¢ shown in the accompanying figure passes through the points 
(x1, ¥1), (%2, 2), and (x3, 3). Show that the coefficients a, b, and c are a solution of the system of 


linear equations whose augmented matrix is 


2 
xy x1 1 yy 


y=ax+bxt+e 


(Xa, ¥3) 


Figure Ex-15 


16. Explain why each of the three elementary row operations does not affect the solution set of a linear system. 
17. Show that if the linear equations 
xy #Axg=c andx; +/x3=a 


have the same solution set, then the two equations are identical (1.e., 4 — ] and¢ = g). 


True-False Exercises 


In parts (a)-(h) determine whether the statement is true or false, and justify your answer. 
(a) A linear system whose equations are all homogeneous must be consistent. 
Answer: 


True 


(b) Multiplying a linear equation through by zero is an acceptable elementary row operation. 
Answer: 


False 
(c) The linear system 
x=y=3 
2x—2y=k 


cannot have a unique solution, regardless of the value of k. 
Answer: 


True 


(d) A single linear equation with two or more unknowns must always have infinitely many solutions. 
Answer: 


True 


(e) If the number of equations in a linear system exceeds the number of unknowns, then the system must be 
inconsistent. 


Answer: 


False 


(f) If each equation in a consistent linear system is multiplied through by a constant c, then all solutions to the 
new system can be obtained by multiplying solutions from the original system by c. 


Answer: 


False 


(g) Elementary row operations permit one equation in a linear system to be subtracted from another. 
Answer: 


True 


(h) The linear system with corresponding augmented matrix 


2 —1 4 
0 0 —1 
is consistent. 


Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


1.2 Gaussian Elimination 


In this section we will develop a systematic procedure for solving systems of linear equations. The procedure is based on 
the idea of performing certain operations on the rows of the augmented matrix for the system that simplifies it to a form 
from which the solution of the system can be ascertained by inspection. 


Considerations in Solving Linear Systems 


When considering methods for solving systems of linear equations, it is important to distinguish between large systems 
that must be solved by computer and small systems that can be solved by hand. For example, there are many applications 
that lead to linear systems in thousands or even millions of unknowns. Large systems require special techniques to deal 
with issues of memory size, roundoff errors, solution time, and so forth. Such techniques are studied in the field of 
numerical analysis and will only be touched on in this text. However, almost all of the methods that are used for large 
systems are based on the ideas that we will develop in this section. 


Echelon Forms 


In Example 6 of the last section, we solved a linear system in the unknowns x, y, and z by reducing the augmented matrix 
to the form 


from which the solution x = 1, y = 2, z = 3 became evident. This is an example of a matrix that is in reduced row 
echelon form. To be of this form, a matrix must have the following properties: 


1. Ifa row does not consist entirely of zeros, then the first nonzero number in the row is a 1. We call this a leading 1. 
2. If there are any rows that consist entirely of zeros, then they are grouped together at the bottom of the matrix. 


3. In any two successive rows that do not consist entirely of zeros, the leading 1 in the lower row occurs farther to the 
right than the leading 1 in the higher row. 


4. Each column that contains a leading | has zeros everywhere else in that column. 


A matrix that has the first three properties is said to be in row echelon form. (Thus, a matrix in reduced row echelon 
form is of necessity in row echelon form, but not conversely.) 


EXAMPLE 1 Row Echelon and Reduced Row Echelon Form 


The following matrices are in reduced row echelon form. 


The following matrices are in row echelon form but not reduced row echelon form. 


14-3 7? 110 012 60 
O 1 6 2], JO Oj, }9 O 1 =—1 0 
0 0 


1 
00 615 0 000 01 


EXAMPLE 2 More on Row Echelon and Reduced Row Echelon Form 


As Example | illustrates, a matrix in row echelon form has zeros below each leading 1, whereas a matrix in 
reduced row echelon form has zeros below and above each leading 1. Thus, with any real numbers substituted for 
the *'s, all matrices of the following types are in row echelon form: 


0 1 x *e KK K HK K K 

1 * Ke 1 x * | *x* Ke 
; ; 0 0 0 0 1 *x* * * HK 
001 * 0:0 1 * 000 0 000001" * * * 
000 1 000 0 000 0 000000001 * 

All matrices of the following types are in reduced row echelon form: 

0 1* 000 * * 0 * 

* * 
100 0 10 0 1 0 000100* * g * 
010 0 Ol Oe 0 1 * * 000010* *Q * 
0010/7 001 *?f 0000/7 000001* * 9 * 
000 1 000 0 000 0 000000001 * 


If, by a sequence of elementary row operations, the augmented matrix for a system of linear equations is put in reduced 
row echelon form, then the solution set can be obtained either by inspection or by converting certain linear equations to 
parametric form. Here are some examples. 


In Example 3 we could, if desired, express the 
solution more succinctly as the 4-tuple (3, —1, 0, 5). 


EXAMPLE 3 Unique Solution <@ 


Suppose that the augmented matrix for a linear system in the unknowns x1, x2, x3, and x4 has been reduced 
by elementary row operations to 


1000 3 
0100 =1 
0010 0 
0001 5 
This matrix is in reduced row echelon form and corresponds to the equations 
x1 = 3 
x3 = =1 
X3 = 0 
x4 = 5 


Thus, the system has a unique solution, namely, x; = 3,x3 = —1,x%3=0,x4=5. 


EXAMPLE 4 Linear Systems in Three Unknowns 


In each part, suppose that the augmented matrix for a linear system in the unknowns x, y, and z has been 
reduced by elementary row operations to the given reduced row echelon form. Solve the system. 


1000 Lo 3] a 
(a)|0 12 0] (01-4 2] lo ooo 
0001 00 0 0 0 000 


Solution 
(a) The equation that corresponds to the last row of the augmented matrix is 
Ox + Oy + 0z= 1 
Since this equation is not satisfied by any values of x, y, and z, the system is inconsistent. 
(b 


— 


The equation that corresponds to the last row of the augmented matrix is 
Ox ++ Oy + 0z =0 
This equation can be omitted since it imposes no restrictions on x, y, and z; hence, the linear system 
corresponding to the augmented matrix is 
x -3z = <1 

yedz = 2 
Since x and y correspond to the leading 1's in the augmented matrix, we call these the leading 
variables. The remaining variables (in this case z) are called free variables. Solving for the leading 
variables in terms of the free variables gives 

x=-1-32 

yoe+4z 
From these equations we see that the free variable z can be treated as a parameter and assigned an 


arbitrary value, t, which then determines values for x and y. Thus, the solution set can be represented 
by the parametric equations 


x= =—-1=3f, y=2+4t, z= 

By substituting various values for ¢ in these equations we can obtain various solutions of the system. 
For example, setting ¢ = 0 yields the solution 

x=-<1, p=2, z=0 
and setting ¢ — ] yields the solution 

x=-4, ypy=6, z=1 

(c) As explained in part (b), we can omit the equations corresponding to the zero rows, in which case the 

linear system associated with the augmented matrix consists of the single equation 


x—5y+z=4 (1) 


from which we see that the solution set is a plane in three-dimensional space. Although 1 is a valid 
form of the solution set, there are many applications in which it is preferable to express the solution 
set in parametric form. We can convert | to parametric form by solving for the leading variable x in 
terms of the free variables y and z to obtain 

x=4+5y =z 


From this equation we see that the free variables can be assigned arbitrary values, say y = s and z—j, 
which then determine the value of x. Thus, the solution set can be expressed parametrically as 


x=4+4+5s-f, p=s, z=t (2) 


We will usually denote parameters in a 
general solution by the letters 7, s, t,..., but 
any letters that do not conflict with the names 
of the unknowns can be used. For systems 
with more than three unknowns, subscripted 
letters such as 1, f2, f3,... are convenient. 


Formulas, such as 2, that express the solution set of a linear system parametrically have some associated terminology. 


DEFINITION 1 


If a linear system has infinitely many solutions, then a set of parametric equations from which all solutions can 
be obtained by assigning numerial values to the parameters is called a general solution of the system. 


Elimination Methods 


We have just seen how easy it is to solve a system of linear equations once its augmented matrix is in reduced row 
echelon form. Now we will give a step-by-step elimination procedure that can be used to reduce any matrix to reduced 
row echelon form. As we state each step in the procedure, we illustrate the idea by reducing the following matrix to 
reduced row echelon form. 


00 <2 0 7 12 
24 =-10 6 12 28 
24 =-5 6 =5 =-1 


Step 1. Locate the leftmost column that does not consist entirely of zeros. 

° 0 —-2 0 7 12 
4 —10 6 12 2 

4 —5 6 -5 -] 


c— 
NM Pp 


Leftmost nonzero column 


Step 2. Interchange the top row with another row, if necessary, to bring a nonzero entry to the top of the column found in 
Step 1. 

24 -10 6 12 28 

00 =-2 0 F 12) The first and second rows in the preceding matrix were interchanged. 

24 -5 6 =-5 -1 


Step 3. If the entry that is now at the top of the column found in Step | is a, multiply the first row by 1/a in order to 
introduce a leading 1. 
12 +53 6 14 
00 =—2 0 F 12) «The first row of the preceding matrix was multiplied by 
24-5 6 =-5 =1 


1 
= 


Step 4. Add suitable multiples of the top row to the rows below so that all entries below the leading 1 become zeros. 


12 =-5 3 6 14 
00 =2 0 F 12) « =—2 times the first row of the preceding matrix was added to the third row. 


00 50 =—17 =—29 


Step 5. Now cover the top row in the matrix and begin again with Step 1 applied to the submatrix that remains. Continue 
in this way until the entire matrix is in row echelon form. 


1 YT oe 6 14 


0 7 12 
0 -—17 —29 


_~ 
~— 
~ 
Qo 
—— Wr bo 


~ Leftmost nonzero column 
in the submatrix 


l ea pee. 6 14 


The first row in the submatrix was 
; 1 
multiplied by — to introduce a leading 1. 


I 2 -5 3 6 14 
0 0 | 0 a —6 —— —S times the first row of the submatrix 
- was added to the second row of the 
0 0 0 0 1 l submatrix to introduce a zero below the 
2 leading 1, 
1 2 -5 3 6 14 
0 0 1 0 = 2 —6 ——— _ The top row in the submatrix was covered, 
2 and we returned again to Step 1. 
0 0 0 0 ¢ 41 
t Leftmost nonzero column 


in the new submatrix 


| 2-5 3 6 14 


0 0 1 0 7 = ———_ The first (and only) row in the new 
= 7 submatrix was multiplied by 2 to introduce 
0 0 0 0 | 7 a leading 1. 


The entire matrix is now in row echelon form. To find the reduced row echelon form we need the following additional 
step. 

Step 6. Beginning with the last nonzero row and working upward, add suitable multiples of each row to the rows above 
to introduce zeros above the leading I's. 


12 -5 3 6 14 
00 100 1) & Z times the third row of the preceding matrix was added to the second row. 
00 001 2 
12 —5 3 0 
0.0 100 1 « —6 times the third row was added to the first row. 
0006006 (OU 
120307 
00100 1 — 5 times the second row was added to the first row. 
000012 
The last matrix is in reduced row echelon form. 


The procedure (or algorithm) we have just described for reducing a matrix to reduced row echelon form is called Gauss- 
Jordan elimination. This algorithm consists of two parts, a forward phase in which zeros are introduced below the 
leading 1's and then a backward phase in which zeros are introduced above the leading 1's. If only the forward phase is 
used, then the procedure produces a row echelon form only and is called Gaussian elimination. For example, in the 
preceding computations a row echelon form was obtained at the end of Step 5. 


Carl Friedrich Gauss (1777-1855) 


| as 
ya “4 
RR 


6 


Wilhelm Jordan (1842-1899) 


Historical Note Although versions of Gaussian elimination were known much earlier, the power of the method 
was not recognized until the great German mathematician Carl Friedrich Gauss used it to compute the orbit of 
the asteroid Ceres from limited data. What happened was this: On January 1, 1801 the Sicilian astronomer 
Giuseppe Piazzi (1746-1826) noticed a dim celestial object that he believed might be a “missing planet.” He 
named the object Ceres and made a limited number of positional observations but then lost the object as it neared 
the Sun. Gauss undertook the problem of computing the orbit from the limited data using least squares and the 
procedure that we now call Gaussian elimination. The work of Gauss caused a sensation when Ceres reappeared 


a year later in the constellation Virgo at almost the precise position that Gauss predicted! The method was further 
popularized by the German engineer Wilhelm Jordan in his handbook on geodesy (the science of measuring 
Earth shapes) entitled Handbuch der Vermessungskunde and published in 1888. 

[Images: Granger Collection (Gauss); wikipedia (Jordan) | 


EXAMPLE 5 Gauss-Jordan Elimination 


Solve by Gauss-Jordan elimination. 


x1 + 3x2 — 2x3 + 2x5 = 0 

2x, + 6x2—5x3— 2xg+4x5— 3xg= -—1 
5x3+10x4 + 15x6= 5 

2x1 + 6x2 + 8x4g+4x5 + 18xg= 6 


Solution The augmented matrix for the system is 


13 -2 02 0 90 
26 =-5 -2 4 -3 -1 
000650610 0 «15 0=~= 65 
26 0 84 18 6 


Adding —2 times the first row to the second and fourth rows gives 


13 -2 02 0 90 
00 =-1 -2 0 -3 =-1 
00 5 100 15 5 
00 4 80 18 6 


Multiplying the second row by —1 and then adding —5 times the new second row to the third row and —4 
times the new second row to the fourth row gives 


13 -2 0200 
00. 1203 1 
00 0000 0 
0006000 62 


Interchanging the third and fourth rows and then multiplying the third row of the resulting matrix by < 
gives the row echelon form 


13 -2 020 0 
00 6120 3 


1 
00 ooo12 

3 
00 00000 


This completes the forward phase since there are zeros below the leading 1's . 


Adding —3 times the third row to the second row and then adding 2 times the second row of the resulting 
matrix to the first row yields the reduced row echelon form 


130420 0 

001200 0 

000001 1 This completes the backward phase since there are zeros above the leading 1's . 
3 

000000 0 


The corresponding system of equations is 


Xy+3xq  +4xq+2x5° = 
x3 2x4 a 


Wh oO 


(3) 
xg= 
Note that in constructing the linear system in 
3 we ignored the row of zeros in the 
corresponding augmented matrix. Why is this 
justified? 
Solving for the leading variables we obtain 
xy = — 3x9 —4x4— 2x5 
x3=> —2x4 
1 
x5 = S 


Finally, we express the general solution of the system parametrically by assigning the free variables x2, x4, 
and x5 arbitrary values r,s, and t, respectively. This yields 


xy= —3r—4s—2f, xg=>7r, x3= =-285, x4=5, x5=8, 6=5 


Homogeneous Linear Systems 


A system of linear equations is said to be homogeneous if the constant terms are all zero; that is, the system has the form 


411X1 #AjQxXQ+...+ AiyX, =D 
471%, +472%2+..-+ 42%, =0 
Am1X1 + am2X2 + --- ayy X%y = 9 


Every homogeneous system of linear equations is consistent because all such systems have x; = 0, x3 = 0,...,x,, = Oas 
a solution. This solution is called the trivial solution; if there are other solutions, they are called nontrivial solutions. 


Because a homogeneous linear system always has the trivial solution, there are only two possibilities for its solutions: 
e The system has only the trivial solution. 
e The system has infinitely many solutions in addition to the trivial solution. 
In the special case of a homogeneous linear system of two equations in two unknowns, say 
ajx-+b;y=0 (a, 4] not both zero) 
a3x + bay =0 (a2, 2 not both zero) 
the graphs of the equations are lines through the origin, and the trivial solution corresponds to the point of intersection at 
the origin (Figure 1.2.1). 


ayx+byy=0 
x 


a,x +byy=0 
and 
aax+b,y=0 


ax + bay =0 


Only the trivial solution Infinitely many 
solutions 


Figure 1.2.1 


There is one case in which a homogeneous system is assured of having nontrivial solutions—namely, whenever the 
system involves more unknowns than equations. To see why, consider the following example of four equations in six 
unknowns. 


EXAMPLE 6 AHomogeneous System << 


Use Gauss-Jordan elimination to solve the homogeneous linear system 


x1 + 3x9— 2x3 + 2x5 =0 
2x4 + 6x9 — 5x3 — 2xqg+4x5— 3xg=—0 

5x3 + 10x4 + 15xg=90 (4) 
2x1 + 6x2 + 8xq4+ 4x5 + 18xg=0 


Solution Observe first that the coefficients of the unknowns in this system are the same as those in 
Example 5; that is, the two systems differ only in the constants on the right side. The augmented matrix for 
the given homogeneous system is 


3-2 02 00 
oe 2 ee a 
0 5 100 150 (5) 
6 0 84 180 


NON 


which is the same as the augmented matrix for the system in Example 5, except for zeros in the last 
column. Thus, the reduced row echelon form of this matrix will be the same as that of the augmented 
matrix in Example 5, except for the last column. However, a moment's reflection will make it evident that 
a column of zeros is not changed by an elementary row operation, so the reduced row echelon form of 5 is 


1304200 
0 0 0 0 

0 0 1 0 (6) 
0 0 0 0 

The corresponding system of equations is 


xy+3x2 +4xqg+2x5 =0 


X34 2x4 =0 
xg=0 
Solving for the leading variables we obtain 
xy = — 3x9 —4x4— 2x5 
x3=> =—2%x4 (7) 
xg=0 


If we now assign the free variables x2, x4, and xs arbitrary values r, s, and ¢, respectively, then we can 


express the solution set parametrically as 
xy= —3r—4s—2f, xg=r7r, x3=>—28, x4=s5, x5=t, xg=0 


Note that the trivial solution results when y = 5 = ¢ = 0). 


Free Variable in Homogeneous Linear Systems 


Example 6 illustrates two important points about solving homogeneous linear systems: 


1. Elementary row operations do not alter columns of zeros in a matrix, so the reduced row echelon form of the 
augmented matrix for a homogeneous linear system has a final column of zeros. This implies that the linear system 
corresponding to the reduced row echelon form is homogeneous, just like the original system. 


2. When we constructed the homogeneous linear system corresponding to augmented matrix 6, we ignored the row of 
zeros because the corresponding equation 
Oxy + Ox + Ox3 + Ox4 + Ox5 + Oxg =O 
does not impose any conditions on the unknowns. Thus, depending on whether or not the reduced row echelon form 
of the augmented matrix for a homogeneous linear system has any rows of zero, the linear system corresponding to 
that reduced row echelon form will either have the same number of equations as the original system or it will have 
fewer. 


Now consider a general homogeneous linear system with n unknowns, and suppose that the reduced row echelon form of 
the augmented matrix has 7 nonzero rows. Since each nonzero row has a leading 1, and since each leading 1 corresponds 
to a leading variable, the homogeneous system corresponding to the reduced row echelon form of the augmented matrix 
must have r leading variables and », — » free variables. Thus, this system is of the form 


*X icy | Z0=0 
Xia | ~O =0 (8) 
+ 2 ae 


where in each equation the expression }>() denotes a sum that involves the free variables, if any [see 7, for example]. In 
summary, we have the following result. 


THEOREM 1.2.1 Free Variable Theorem for Homogeneous Systems 


If a homogeneous linear system has n unknowns, and if the reduced row echelon form of its augmented matrix 
has r nonzero rows, then the system has n - r free variables. 


Note that Theorem 1.2.2 applies only to 
homogeneous systems—a nonhomogeneous system 
with more unknowns than equations need not be 
consistent. However, we will prove later that if a 
nonhomogeneous system with more unknowns then 
equations is consistent, then it has in infinitely many 
solutions. 


Theorem 1.2.1 has an important implication for homogeneous linear systems with more unknowns than equations. 
Specifically, if a homogeneous linear system has m equations in n unknowns, and if }; — », then it must also be true that 
7 =<» (why?). This being the case, the theorem implies that there is at least one free variable, and this implies in turn that 
the system has infinitely many solutions. Thus, we have the following result. 


THEOREM 1.2.2 


A homogeneous linear system with more unknowns than equations has infinitely many solutions. 


In retrospect, we could have anticipated that the homogeneous system in Example 6 would have infinitely many 
solutions since it has four equations in six unknowns. 


Gaussian Elimination and Back-Substitution 


For small linear systems that are solved by hand (such as most of those in this text), Gauss-Jordan elimination (reduction 
to reduced row echelon form) is a good procedure to use. However, for large linear systems that require a computer 
solution, it is generally more efficient to use Gaussian elimination (reduction to row echelon form) followed by a 
technique known as back-substitution to complete the process of solving the system. The next example illustrates this 
technique. 


EXAMPLE 7 Example 5 Solved by Back-Substitution << 


From the computations in Example 5, a row echelon form of the augmented matrix is 


13 -2 020 0 
00 1203 1 
a 
00. 000 1 3 
00 0000 0 
To solve the corresponding system of equations 
x1 + 3x3 — 2x3 bf 2x5 = 0 
X34 2x4 + 3xg= 1 
cand 
we proceed as follows: 
Step 1. Solve the equations for the leading variables. 
Xp = — 3x9 + 2x3 — 2x5 
x3=1=2x4=— 3x6 
= 
%6=3 


Step 2. Beginning with the bottom equation and working upward, successively substitute each equation 
into all the equations above it. 


Substituting xg = ; into the second equation yields 


xy = — 3x9 + 2x3 = 225 


x3= —2x4 
1 
xs=5 
—s 
Substituting xz = — 2x4 into the first equation yields 
xy = — 3x9 —4x4— 2x5 
X3= — 2x4 
] 
xs=5 
63 


Step 3. Assign arbitrary values to the free variables, if any. 
If we now assign x2, x4, and x5 the arbitrary values r, s, and t, respectively, the general solution is given by 
the formulas 

xy= —3r—4s—28, xg=r, x3=> —-28, xX4=5, xX5=f, XG= 


This agrees with the solution obtained in Example 5. 


EXAMPLE 8 <« 


Suppose that the matrices below are augmented matrices for linear systems in the unknowns xj, x2, x3, and 
x4. These matrices are all in row echelon form but not reduced row echelon form. Discuss the existence 
and uniqueness of solutions to the corresponding linear systems 


1 <3 3-5 {8p 38 i 3% os 
0 12-41 012-41 0 12-41 
Bo 91 691 Plo o1 6 9] Plo o1 69 
0 00 01 0 00 00 0 00 10 


Solution 
(a) The last row corresponds to the equation 
Oxy ++ 0x2 + 0x3 + Ox4g=1 
from which it is evident that the system is inconsistent. 
(b) The last row corresponds to the equation 
Oxy + Ox + 0x3 + Oxg=0 
which has no effect on the solution set. In the remaining three equations the variables x1, x2, and x3 
correspond to leading 1's and hence are leading variables. The variable x4 is a free variable. With a 


little algebra, the leading variables can be expressed in terms of the free variable, and the free variable 
can be assigned an arbitrary value. Thus, the system must have infinitely many solutions. 


(c) The last row corresponds to the equation 
x4=0 
which gives us a numerical value for x4. If we substitute this value into the third equation, namely, 
x3+6x4=9 
we obtain x3 = 9. You should now be able to see that if we continue this process and substitute the 


known values of x3 and x4 into the equation corresponding to the second row, we will obtain a unique 
numerical value for x2; and if, finally, we substitute the known values of x4, x3, and x2 into the 


equation corresponding to the first row, we will produce a unique numerical value for x;. Thus, the 
system has a unique solution. 


Some Facts About Echelon Forms 


There are three facts about row echelon forms and reduced row echelon forms that are important to know but we will not 

prove: 

1. Every matrix has a unique reduced row echelon form; that is, regardless of whether you use Gauss-Jordan elimination 
or some other sequence of elementary row operations, the same reduced row echelon form will result in the end. 

2. Row echelon forms are not unique; that is, different sequences of elementary row operations can result in different 
row echelon forms. 

3. Although row echelon forms are not unique, all row echelon forms of a matrix A have the same number of zero rows, 
and the leading 1's always occur in the same positions in the row echelon forms of A. Those are callled the pivot 
positions of A. A column that contains a pivot position is called a pivot column of A. 


EXAMPLE 9 Pivot Positions and Columns << 


Earlier in this section (immediately after Definition 1) we found a row echelon form of 
00 =-2 0 7 12 
A=|2 4 -10 6 12 28 
24 -5 6 =-5 =-1 


to be 
12 =—5 3 14 


6 
i 
00 10-5 -6 
00 00 1 2 


The leading 1's occur in positions (row 1, column 1), (row 2, column 3), and (row 3, column 5). These are 
the pivot positions. The pivot columns are columns 1,3, and 5. 


Roundoff Error and Instability 


There is often a gap between mathematical theory and its practical implementation—Gauss-Jordan elimination and 
Gaussian elimination being good examples. The problem is that computers generally approximate numbers, thereby 
introducing roundoff errors, so unless precautions are taken, successive calculations may degrade an answer to a degree 
that makes it useless. Algorithms (procedures) in which this happens are called unstable. There are various techniques 
for minimizing roundoff error and instability. For example, it can be shown that for large linear systems Gauss-Jordan 
elimination involves roughly 50% more operations than Gaussian elimination, so most computer algorithms are based on 
the latter method. Some of these matters will be considered in Chapter 9. 


Concept Review 


Reduced row echelon form 


Row echelon form 


Leading 1 


Leading variables 


Free variables 


General solution to a linear system 


Gaussian elimination 


Gauss-Jordan elimination 


Forward phase 
Backward phase 


Homogeneous linear system 


Trivial solution 


Nontrivial solution 


Dimension Theorem for Homogeneous Systems 


Back-substitution 


Skills 
e Recognize whether a given matrix is in row echelon form, reduced row echelon form, or neither. 


e Construct solutions to linear systems whose corresponding augmented matrices that are in row echelon form or 
reduced row echelon form. 


e Use Gaussian elimination to find the general solution of a linear system. 
e Use Gauss-Jordan elimination in order to find the general solution of a linear system. 


e Analyze homogeneous linear systems using the Free Variable Theorem for Homogeneous Systems. 


Exercise Set 1.2 


1. In each part, determine whether the matrix is in row echelon form, reduced row echelon form, both, or neither. 


(ay pi Oo 
01 90 
001 

(b) }1 0 0 
010 
00 90 

(c) }O 1 9 
001 
00 0 

@|/1 0 3 1 
0124 


N 


i?) 


(e) 


(f) 


() [1-75 5 
0 13 2 


Answer: 


ese OCOD CO CC CO Ke 
oo o coo OWN 


(a) Both 
(b) Both 
(c) Both 
(d) Both 
(e) Both 
(f) Both 
(g) Row echelon 


. In each part, determine whether the matrix is in row echelon form, reduced row echelon form, both, or neither. 


(a) |1 2 0 
010 
000 

(b)|1 9 0 
010 
020 

(c) |1 3 4 
00 1 
000 

(d)}1 5 = 
0 1 1 
00 60 

(e) |1 2 3 
000 
00 1 

(ff) |}1 23 45 
10713 
0000 1 
0000 0 

(g)}1 =—-2 0 1 
0 o1 —2 


. In each part, suppose that the augmented matrix for a system of linear equations has been reduced by row operations 


to the given reduced row echelon form. Solve the system. 


(a) 


| 
" 
| 


aon 


— £ co 
| 
KF Or Dee TD KBP WA UM ~~ 
MO W DH 


OW HA CO 


ow! UW 


(c) 


Coot OHO 
I 


(d) 


oor OFC COre OC re COCO Ke 
ohLhwy OOF Nh 


on 


Answer: 


(a) *1= — 37, x2= —8, x3=5 

(b) %1 = 13¢— 10, xg = 134-5, x3= —£4+2, x4=F 

(c) *1= —fs+2t—11, x2=8, x3= —3t—4, x4= — 3649, x5=8 
(d) Inconsistent 


4. In each part, suppose that the augmented matrix for a system of linear equations has been reduced by row operations 
to the given reduced row echelon form. Solve the system. 


a) [100 -3 
010 0 
001 7 

oe Pe er ee 
010 3 2 
O-0-1 ~ 1s 

Chi a6610 3259 
GC Oia 7 
0 0015 8 
0 0000 O 

(a) [1 -3 0 0 
0 010 
0 001 


In Exercises 5—8, solve the linear system by Gauss-Jordan elimination. 


5. Xy#xg+¢2x3 = 8 
=x, —2x9+3x3 = 
3x, —7?xg+4x3 = 10 


Answer: 


xy =3, xg=1, x3=2 


6. 2x, +2x2z+2x3 = 0 
=2x, +5x9+2x3 = 

8xy+xg+4x3 = =1 

7, x= yoz— w= 1 

ax-+ y=2z—2w= =—2 

=—x+2y-4z+ w= 1 

3x -3w= -—3 


Answer: 


x=f-1, y=2s, z=s, w= 


8. =—2b+3¢ = 1 
3a+6b—3c = —2 
6a+6b+4+3c = 5 


In Exercises 9-12, solve the linear system by Gaussian elimination. 
9. Exercise 5 


Answer: 


xy =3, x3=1, x3=2 
10. Exercise 6 


11. Exercise 7 
Answer: 


x=f=-1, y=2s,z=s, w=i 


12. Exercise 8 


In Exercises 13—16, determine whether the homogeneous system has nontrivial solutions by inspection (without pencil 
and paper). 


13, 2x, —3x2+4x3— x4 = 0 
Txy+ x9—8x3+9x4 = O 
2x, + 8x9+ x3—%4 = OD 


Answer: 


Has nontrivial solutions 


14,%,+3x2-—%3 = 0 
x27—8x%3 = O 
4x3 = O 


15, 411%1 +@12%2 +413%3 = 9 
471%, + 99x72 + a73x3 = 9 


Answer: 


Has nontrivial solutions 
16. 3x1 — 2x2=0 
6x, —4x2=0 


In Exercises 17—24, solve the given homogeneous linear system by any method. 


17. 2x, + x2+3x3 = 0 
X1 + 2x2 = 0 
0 


Answer: 


xy=0, x3=0, x3=0 
18. 2x— y—3z = 0 
=x + 2y = 3z 0 
x+ y+4z = 0 


19, 3x1 #x9+%3+%4=0 


5x1 —%2+2x%3—x4=0 
Answer: 


Xj = —s, x3= —f-—s, x3=45, x4=t 
20. v + 3w — 2x 
2u-+ v—4w+3x = 0 
2u+3v+2w—- x = 0 
—4u—3v+5w-4x = 0 
21. ax+2y+4z = 0 
0 
0 
0 


| 
° 


w =— y=3z 


2w+3x+ yt z = 
—2w+ x+3y—22 = 


Answer: 


w=f, x= =f, y=i, z=0 
22. *14+3x2 +2x%4 = O 
x1 44x94 2x3 =. 0 
=—2x,=—2x3=—%x4 = 0 

2x, —4x9+ x34+%4 = 0 
xy, —2x2— x34+%x%4 = O 


23. 2f5 — ig 43ig+4ig = 3 
iy —2i3+7i4 = 11 
31; —3lg+ 13+5i4 = 8 
aly + fg9+4i3+4/4 = 10 
Answer: 
fy= <1, f9=0, Ig=1, f4=2 
24, 23+ Z44+Z5=0 
—2Z,— 23+ 223 —32Z4+2Z5=0 
Zi + £3z- 223 =—2Z5=0 


22, +22Z2;-— £3 +25=0 


In Exercises 25—28, determine the values of a for which the system has no solutions, exactly one solution, or infinitely 


many solutions. 


25, x+2y=— SZ. 4 
3x—=— y+ 2) = 2 
4x+ y+ (a? = 4p = a+2 
Answer: 
If g = 4, there are infinitely many solutions; if g — ~— 4, there are no solutions; if g + + 4, there is exactly one 
solution. 
26. x+2y4 e ="2 
2x = 2y + 3z = 1 
x-+2y— (a?—3) =a 
27. x+ 2y _— 1 
2x + (a? —5\y = 2£—1 
Answer: 
If g = 3, there are infinitely many solutions; if g — ~— 3, there are no solutions; if g « + 3, there is exactly one 
solution. 
28. x+ ¥4 jz = -7 
2x + 3y + l7z = =16 
x +2y 4 (2? +1 = 3a 


In Exercises 29-30, solve the following systems, where a, b, and c are constants. 


29, 2x + y=a 
3x +6y = 
Answer: 
—~24_6 ,__ a, 2b 
SG oe 
30.71 F%24+ %3 = @ 
2x1. + 2x3 = «=O 
3xg4+3x3 = ¢ 


31. Find two different row echelon forms of 


3] 


This exercise shows that a matrix can have multiple row echelon forms. 


Answer: 
[2 om[s torre 


32. Reduce 


0 =-2 =—29 
3. 4 5 
to reduced row echelon form without introducing fractions at any intermediate stage. 
33. Show that the following nonlinear system has 18 solutions if 0 < q@< 20,0 <7 < 2m, and0 <y< 2z. 
sna+2cosG+3tany = 0 
2sna+5cosG@+3tany = O 
0 


=sna—S5cosf+itany = 


[Hint: Begin by making the substitutions x = sin @, y = cos 9, and z = tan ¥.] 


34. Solve the following system of nonlinear equations for the unknown angles a, B, and y, where 0 < a@ < 27, 
O<8< 2a, and0<y<z. 


2sna— cosG+3tany = 3 
4sna+2cosG—2tany = 2 
6sna—3cosG+ tany = 9 


35. Solve the following system of nonlinear equations for x, y, and z. 


x? 4 y? + z = 


[Hint: Begin by making the substitutions y — x2, Y=y?4, Z = 7*] 


Answer: 


x= +1, y= +3, z= + 2 


36. Solve the following system for x, y, and z. 


Lig 4 

zy 2 1 

2 oy 8 

xy 2 0 

1,3, 10 
te = 5 


37. Find the coefficients a, b, c, and d so that the curve shown in the accompanying figure is the graph of the equation 
y =ax? + bx? +x +d. 


Figure Ex-37 


Answer: 


a=1, b= -6, ¢-2, d=10 
38. Find the coefficients a, b, c, and d so that the curve shown in the accompanying figure is given by the equation 
ax? 4 ay? +bx+cy +a =0. 


Figure Ex-38 


39. If the linear system 


ayx+byy+eyz = 0 
agx—boy+egz = 0 
azx+bay—e37z = 0 

has only the trivial solution, what can be said about the solutions of the following system? 
ayxtbyy+eyz = 3 
ayx—huyy-regz = 7 
azx+bay—e37 = 11 


Answer: 


The nonhomogeneous system will have exactly one solution. 


40. (a) IfA is a3 % 5 matrix, then what is the maximum possible number of leading 1's in its reduced row echelon form? 


(b) If Bis a 3 x § matrix whose last column has all zeros, then what is the maximum possible number of parameters 
in the general solution of the linear system with augmented matrix B? 


(c) If Cis a4 x 3 matrix, then what is the minimum possible number of rows of zeros in any row echelon form of 
C? 


41. (a) Prove that if g? — de + O, then the reduced row echelon form of 


al * [oa] 


(b) Use the result in part (a) to prove that if g7 — be « 0, then the linear system 


ax+by=k 
ex+dy=! 
has exactly one solution. 
42. Consider the system of equations 
ax-+by = 0 
cx+dy = 0 
ex+fy = 0 


Discuss the relative positions of the lines gx 4. by = 0, cx + dy =0, and ex 4. fy = 0 when (a) the system has 
only the trivial solution, and (b) the system has nontrivial solutions. 


43. Describe all possible reduced row echelon forms of 


(a) |}a + ¢ 
de 7 
gh i 

(b)|@ 2 ca 
efgek 
i jf ki 
mn pg 


True-False Exercises 


In parts (a)}-(i) determine whether the statement is true or false, and justify your answer. 
(a) If a matrix is in reduced row echelon form, then it is also in row echelon form. 
Answer: 


True 


(b) If an elementary row operation is applied to a matrix that is in row echelon form, the resulting matrix will still be in 
row echelon form. 


Answer: 


False 


(c) Every matrix has a unique row echelon form. 
Answer: 


False 


(d) A homogeneous linear system in 7 unknowns whose corresponding augmented matrix has a reduced row echelon 
form with r leading 1's has n — r free variables. 


Answer: 


True 


(e) All leading 1's in a matrix in row echelon form must occur in different columns. 
Answer: 
True 

(f) If every column of a matrix in row echelon form has a leading | then all entries that are not leading 1's are zero. 
Answer: 


False 


(g) If a homogeneous linear system of 1 equations in n unknowns has a corresponding augmented matrix with a reduced 
row echelon form containing n leading 1's, then the linear system has only the trivial solution. 


Answer: 


True 


(h) If the reduced row echelon form of the augmented matrix for a linear system has a row of zeros, then the system must 
have infinitely many solutions. 


Answer: 


False 


(i) If a linear system has more unknowns than equations, then it must have infinitely many solutions. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


1.3 Matrices and Matrix Operations 


Rectangular arrays of real numbers arise in contexts other than as augmented matrices for linear systems. In this 
section we will begin to study matrices as objects in their own right by defining operations of addition, subtraction, 
and multiplication on them. 


Matrix Notation and Terminology 


In Section 1.2 we used rectangular arrays of numbers, called augmented matrices, to abbreviate systems of linear 
equations. However, rectangular arrays of numbers occur in other contexts as well. For example, the following 
rectangular array with three rows and seven columns might describe the number of hours that a student spent studying 
three subjects during a certain week: 


Mon. Tues. Wed. Thurs. Fri. Sat. Sun. 
Math 2 3 2 4 1 4 2 


History 0 3 1 4 3 2 2 
Language 4 1 3 1 0 0 2 


If we suppress the headings, then we are left with the following rectangular array of numbers with three rows and 
seven columns, called a “matrix”: 


232441 
03143 
4131 0 


More generally, we make the following definition. 


DEFINITION 1 


A matrix is a rectangular array of numbers. The numbers in the array are called the entries in the matrix. 


A matrix with only one column is called a column 
vector or a column matrix, and a matrix with only 
one row is called a row vector or a row matrix. In 
Example 1, the 2 x 1 matrix is a column vector, the 
1 x 4 matrix is a row vector, and the ] x | matrix 
is both a row vector and a column vector. 


EXAMPLE 1 Examples of Matrices 


Some examples of matrices are 


oO Pp 
— 
tho 
—s 
oS 
I 
a) 
— 
Lo] 

onl = 
rs 
| 

J — 
| 
— 
fs 
i 


The size of a matrix is described in terms of the number of rows (horizontal lines) and columns (vertical lines) it 
contains. For example, the first matrix in Example | has three rows and two columns, so its size is 3 by 2 (written 

3 x 2). In a size description, the first number always denotes the number of rows, and the second denotes the number 
of columns. The remaining matrices in Example | have sizes | x 4, 3 x 3, 2% 1, and | x 1, respectively. 


We will use capital letters to denote matrices and lowercase letters to denote numerical quantities; thus we might write 
a: 49 abe 
A= or C= 
342 def 
When discussing matrices, it is common to refer to numerical quantities as scalars. Unless stated otherwise, scalars 
will be real numbers; complex scalars will be considered later in the text. 


Matrix brackets are often omitted from ] x | 
matrices, making it impossible to tell, for example, 
whether the symbol 4 denotes the number “four” or 
the matrix [4]. This rarely causes problems because 
it is usually possible to tell which is meant from the 
context. 


The entry that occurs in row i and column / of a matrix A will be denoted by aj. Thus a general 3 % 4 matrix might be 
written as 
411 @12 @13 @14 
A=/421 422 @23 424 
431 @32 433 34 


and a general j9 x , matrix as 


a1, @j2 *** aly 
@21 422 ""*"* @3 

A=l _ (1) 
&m1 m2 "°°" Gam 


When a compact notation is desired, the preceding matrix can be written as 

[a;;] mxn °F [a;;] 
the first notation being used when it is important in the discussion to know the size, and the second being used when 
the size need not be emphasized. Usually, we will match the letter denoting a matrix with the letter denoting its 
entries; thus, for a matrix B we would generally use bj for the entry in row 7 and column /, and for a matrix C we 
would use the notation cj. 


The entry in row i and column j of a matrix A is also commonly denoted by the symbol (A)j. Thus, for matrix 1 
above, we have 
(A) ip = @iy 


and for the matrix 
2 =3 
A= 
we have (A) a= 2, (A) po = 3, (A) = Gs and (A)a9 = 0. 
Row and column vectors are of special importance, and it is common practice to denote them by boldface lowercase 


letters rather than capital letters. For such matrices, double subscripting of the entries is unnecessary. Thus a general 
1 x» row vector a and a general 7; x% | column vector b would be written as 


by 
a=[@,@2°°**@,] and b= 
bm 
A matrix A with n rows and n columns is called a square matrix of order n, and the shaded entries a@1;, 433, --.. @yy 
in 2 are said to be on the main diagonal of A. 


i412 ++) Gin 


21 G29 "+ In 


(2) 


| aa Gn2 -*** nn 


Operations on Matrices 


So far, we have used matrices to abbreviate the work in solving systems of linear equations. For other applications, 
however, it is desirable to develop an “arithmetic of matrices” in which matrices can be added, subtracted, and 
multiplied in a useful way. The remainder of this section will be devoted to developing this arithmetic. 


DEFINITION 2 


Two matrices are defined to be equal if they have the same size and their corresponding entries are equal. 


The equality of two matrices 
A= [ayj] and B= [By] 
of the same size can be expressed either by writing 
(A); a (B)iy 
or by writing 
ay = by 
where it is understood that the equalities hold for 
all values of i and /. 


EXAMPLE 2 Equality of Matrices << 


2 1 2 1 210 
5 ch [5 5} o=[5 4 ol 
If x = 5, then 4 — 8, but for all other values of x the matrices A and B are not equal, since not all of 


their corresponding entries are equal. There is no value of x for which 4 — ¢ since A and C have 
different sizes. 


Consider the matrices 


DEFINITION 3 


If A and B are matrices of the same size, then the sum 4 4- 8 is the matrix obtained by adding the entries of B 
to the corresponding entries of A, and the difference 4 — # is the matrix obtained by subtracting the entries of 
B from the corresponding entries of A. Matrices of different sizes cannot be added or subtracted. 


In matrix notation, if A= [@,;] and 8 = [4;;] have the same size, then 


(A+ 8)i;5 = (A)ayy + By = ay + 23; and (A- 3) y = Ay — Q)y = ayy — by; 


EXAMPLE 3 Addition and Subtraction 


Consider the matrices 


10 3 —4 3 5 11 
A=|—1 02 4], B= 2 2 0 = c=|; | 
—2 7 0 32 —4 
Then 
—2 45 4 6 —2 —5 2 
A+8= 12 2 3] and A—B=| —3 2 2 5 
703 5 1 —4 11 5 


The expressions A+ C, 8+ C’, A—C, and 8 — care undefined. 


DEFINITION 4 


If A is any matrix and c is any scalar, then the product cA is the matrix obtained by multiplying each entry of 
the matrix A by c. The matrix cA is said to be a scalar multiple of A. 


In matrix notation, if A= [@,;], then 


(cA) ti c(A) iy = C@yy 


EXAMPLE 4 Scalar Multiples 


For the matrices 
we have 
It is common practice to denote (— 1)B by —B. 


Thus far we have defined multiplication of a matrix by a scalar but not the multiplication of two matrices. Since 
matrices are added by adding corresponding entries and subtracted by subtracting corresponding entries, it would 
seem natural to define multiplication of matrices by multiplying corresponding entries. However, it turns out that such 
a definition would not be very useful for most problems. Experience has led mathematicians to the following more 


useful definition of matrix multiplication. 


DEFINITION 5 


If A is an jz x » matrix and B is an p x » matrix, then the product AB is the jz x %, matrix whose entries are 
determined as follows: To find the entry in row i and column j of AB, single out row 7 from the matrix A and 
column / from the matrix B. Multiply the corresponding entries from the row and column together, and then 


add up the resulting products. 


EXAMPLE 5 Multiplying Matrices <4 


Consider the matrices 


Since A is a 2 x 3 matrix and B is a 3 x 4 matrix, the product AB is a 2 x 4 matrix. To determine, for 
example, the entry in row 2 and column 3 of AB, we single out row 2 from A and column 3 from B. 
Then, as illustrated below, we multiply corresponding entries together and add up these products. 


[i 2 4lfo 13 af-[b 6 OO 
26, 752] (UU ea 


(2-4) + (6-3) + (0-5) =26 


The entry in row | and column 4 of AB is computed as follows: 


(1:3) + (2-1)+ (4:2) =13 
The computations for the remaining entries are 

(1.4) + (2.0) + (4.2) =12 

(1.1) = (2.1) + (4.7) =27 

(1.4) + (2.3) + (4.5) = 30 _ i. 27 30 3 

(2.4) + (6.0) + (0.2) =8 8 =—4 26 12 
(2.1) = (6.1) + (0.7) = = 

(2.3) + (6.1) + (0.2) =12 


The definition of matrix multiplication requires that the number of columns of the first factor A be the same as the 
number of rows of the second factor B in order to form the product AB. If this condition is not satisfied, the product is 
undefined. A convenient way to determine whether a product of two matrices is defined is to write down the size of 
the first factor and, to the right of it, write down the size of the second factor. If, as in 3, the inside numbers are the 
same, then the product is defined. The outside numbers then give the size of the product. 


A B AB 


‘ Inside ; (3) 


Outside 


- 


Gotthold Eisenstein (1823-1852) 


Historical Note The concept of matrix multiplication is due to the German mathematician Gotthold 
Eisenstein, who introduced the idea around 1844 to simplify the process of making substitutions in linear 
systems. The idea was then expanded on and formalized by Cayley in his Memoir on the Theory of Matrices 
that was published in 1858. Eisenstein was a pupil of Gauss, who ranked him as the equal of Isaac Newton 
and Archimedes. However, Eisenstein, suffering from bad health his entire life, died at age 30, so his potential 
was never realized. 

[Image: wikipedia] 


EXAMPLE 6 _ Determining Whether 


a Product Ils Defined << 


Suppose that A, B, and C are matrices with the following sizes: 


A 
3x4 


B C 
4x7 7x3 


Then by 3, AB is defined and is a 3 x 7 matrix; BC is defined and is a 4 x 3 matrix; and CA is defined 
and is a? x 4 matrix. The products AC, CB, and BA are all undefined. 


In general, if A= [a;;] is an py x » matrix and 8 = [4;;] is an » x » matrix, then, as illustrated by the shading in 4, 


@11 @12 ay 


an ay ay, || P11 12 by; bin 
AB=| gan _ "a 7 (4) 
Be a ra én oy by; bey 
the entry (A8);; in row i and column j of AB is given by 
(AB) yy = @y121; + @;22; + 43363; 4 b Aipdy (5) 


Partitioned Matrices 


A matrix can be subdivided or partitioned into smaller matrices by inserting horizontal and vertical rules between 
selected rows and columns. For example, the following are three possible partitions of a general 3 x 4 matrix 4A—the 
first is a partition of A into four submatrices A\,, A\2, A21, and A22; the second is a partition of A into its row vectors 


Yr}, r2, and r3; and the third is a partition of A into its column vectors cj, ¢2, ¢3, and c4: 


413 
423 
233 


a12 
@22 
432 


a1] 
A=| 221 
@12 
@22 
432 


413 
423 
433 
413 
423 
233 


412 
&22 
432 


a14 
Ay Aj2 
Sl dae oA 
az4 21 #422 
a14 rl 
424) =| F2 
a34 3 
a14 
a34/=—[e, €2 cz C4] 
a34 


Matrix Multiplication by Columns and by Rows 


Partitioning has many uses, one of which is for finding particular rows or columns of a matrix product AB without 


computing the entire product. Specifically, the following 


formulas, whose proofs are left as exercises, show how 


individual column vectors of AB can be obtained by partitioning B into column vectors and how individual row 


vectors of AB can be obtained by partitioning A into row 


vectors. 


AB= Afb, bz +++ by]=[4Aby 4bg +--+ Ady] 


6 

(AS computed column by column) (9) 

aj a3 

AR= a2 B a3 
‘ (7) 

am Ames 

(AS computed row by row) 
In words, these formulas state that 

jth column vector of AS = A[ j th column vector of 3] (8) 
ith row vector of AS = [7 throw vector of A]B (9) 


EXAMPLE 7 Example5 Revisited << 


If A and B are the matrices in Example 5, then from 8 the second column vector of AB can be obtained 


by the computation 
E 2 | 7 ic 
26 0 ; a 


I t 
Second column of 3 Second column of AB 
and from 9 the first row vector of AB can be obtained by the computation 


4 | 4 3 

[1 2 4]}/O -1 3 1] = [12 27 30 13] + 
2 7 5 2 

First row of A First row of AB — 


Matrix Products as Linear Combinations 


We have discussed three methods for computing a matrix product AB—entry by entry, column by column, and row by 
row. The following definition provides yet another way of thinking about matrix multiplication. 


DEFINITION 6 


If Aj, A3, .... Ay are matrices of the same size, andifc,, ¢3, ..., ¢y are scalars, then an expression of the 


form 


cyAj +02A2+° + + +0c,A, 
is called a linear combination of Aj, Az, .... Ay with coefficients ¢,, ¢3, ..., Cy 


To see how matrix products can be viewed as linear combinations, let A be an jz; x », matrix and x an » x | column 
vector, say 


411 @j2 *** Gin x4 
Aa|2 92 °°" 42) ang =| 7? 
Bn! am? amy xy 
Then 
@yiX, + jQ%Q Ft + B1yXy a11 @12 ain 
ae a21*1 + 22%2 Fes + ili =x Fal +2x3 722 +x, — (10) 
Omi *1 + amaX9 gow 1s Smt n em bm am 


This proves the following theorem. 
THEOREM 1.3.1 


IfA is an yy xy matrix, and if x is an » x | column vector, then the product Ax can be expressed as a linear 
combination of the column vectors of A in which the coefficients are the entries of x. 


EXAMPLE 8 Matrix Products as Linear Combinations 


The matrix product 
-13 2 2 1 
12 =3)}|/ =-1}/=| =9 
21 <2 3 —3 
can be written as the following linear combination of column vectors 
=-1 3 Z 1 
2) 1/—1)2/+3) —3|/=] -9 
2 1 =—2 —3 


EXAMPLE 9 Columns of a Product AB as Linear Combinations 


We showed in Example 5 that 


_[12 27 30 13 
| 8 =—4 26 12 


It follows from Formula 6 and Theorem 1.3.1 that the j th column vector of AB can be expressed as a 
linear combination of the column vectors of A in which the coefficients in the linear combination are the 
entries from the / th column of B. The computations are as follows: 


Pe) [a] os] +2f0 


2]= [2l- [s} +l 
[as] ~4[2] +946] +-L] 
fe] =[2]* [s}+-h] 


Matrix Form of a Linear System 


Matrix multiplication has an important application to systems of linear equations. Consider a system of m linear 
equations in n unknowns: 


ayixX, + @y9X%2 Hess + AyxX, =), 
aX, + @99x2 Best + AyxX%y, =)3 
AamiX, - Am9X2 bit Ht ayyXy, =4y 


Since two matrices are equal if and only if their corresponding entries are equal, we can replace the m equations in 
this system by the single matrix equation 


44jX, He 4iQX%2 Ht tt He a yXy, by 
@y1X1 + @29x2 +++ + G2nXy|_ | 22 
AmiX1 amygXQq htt bt ayyXy by 


The jy % 1 matrix on the left side of this equation can be written as a product to give 


ay a2 °° ay, ]/[%1 by 
ay 472 *** mm |/x2|_ | b 
m1 m2 "°° Gym || %n bm 


If we designate these matrices by A, x, and b, respectively, then we can replace the original system of m equations in 
n unknowns has been replaced by the single matrix equation 


Ax=b 


The matrix A in this equation is called the coefficient matrix of the system. The augmented matrix for the system is 
obtained by adjoining b to A as the last column; thus the augmented matrix is 


a1) a2 *** ayy |F1 
a21 @22 *"** @an|b 
[Ab] =| : i 


@ml @m2 *" * Gmnjb,, 


The vertical bar in [A|b] is a convenient way to 
separate A from b visually; it has no mathematical 
significance. 


Transpose of a Matrix 


We conclude this section by defining two matrix operations that have no analogs in the arithmetic of real numbers. 


DEFINITION 7 


If A is any 992 x » matrix, then the transpose of A, denoted by Al, is defined to be the 92 s< j, matrix that results 


by interchanging the rows and columns of A; that is, the first column of A’ is the first row of A, the second 


column of 4” is the second row of A, and so forth. 


EXAMPLE 10 SomeTransposes 


The following are some examples of matrices and their transposes. 


411 @12 @13 @14 2 3 
A=|@21 472 @73 @24|, B=|1 4], C=[1 3 5], 
a3] 432 433 a34 5 6 


411 @21 431 


413 a23 433 
a14 a24 234 


1 
ar a|ee om a2} pr_f2 13] cra|s|. 
5 


D=[4] 


Observe that not only are the columns of A’ the rows of A, but the rows of A’ are the columns of 4. Thus the entry in 


row 7 and column j of A’ is the entry in row j and column i of A; that is, 
ay ; 
(),=0 


Note the reversal of the subscripts. 


(11) 


In the special case where A is a square matrix, the transpose of A can be obtained by interchanging entries that are 


symmetrically positioned about the main diagonal. In 12 we see that A’ can also be obtained by “reflecting” A about 


its main diagonal. 


(12) 


1 —2 4 1 3 =-5 
A=| 3 7 0 —2 7 8 
-5 8 6 4 0 6 
Interchange entries that are 
symmetrically positioned 
about the main diagonal. 
DEFINITION 8 


If A is a square matrix, then the trace of A, denoted by tr(A), is defined to be the sum of the entries on the 
main diagonal of A. The trace of A is undefined if A is not a square matrix. 


James Sylvester (1814-1897) 


Arthur Cayley (1821-1895) 


Historical Note The term matrix was first used by the English mathematician (and lawyer) James Sylvester, 
who defined the term in 1850 to be an “oblong arrangement of terms.” Sylvester communicated his work on 
matrices to a fellow English mathematician and lawyer named Arthur Cayley, who then introduced some of 
the basic operations on matrices in a book entitled Memoir on the Theory of Matrices that was published in 
1858. As a matter of interest, Sylvester, who was Jewish, did not get his college degree because he refused to 
sign a required oath to the Church of England. He was appointed to a chair at the University of Virginia in the 
United States but resigned after swatting a student with a stick because he was reading a newspaper in class. 


Sylvester, thinking he had killed the student, fled back to England on the first available ship. Fortunately, the 
student was not dead, just in shock! 
[Images: The Granger Collection, New York] 


EXAMPLE 11 Trace ofaMatrix << 


The following are examples of matrices and their traces. 


Qi, 212 413 —~— 2. 7 
A=|@21 222 233], B= ‘ ; - : 
a a a ~ 
31 @32 233 i p 0 


tr(A) =a11 4am +a% (3B) = —14+547+0=11 


In the exercises you will have some practice working with the transpose and trace operations. 


Concept Review 
Matrix 
Entries 


Column vector (or column matrix) 


Row vector (or row matrix) 


Square matrix 


Main diagonal 


Equal matrices 


Matrix operations: sum, difference, scalar multiplication 


Linear combination of matrices 


Product of matrices (matrix multiplication) 


Partitioned matrices 


Submatrices 


Row-column method 


Column method 


Row method 


Coefficient matrix of a linear system 


Transpose 


Trace 


Skills 


Determine the size of a given matrix. 


Identify the row vectors and column vectors of a given matrix. 


Perform the arithmetic operations of matrix addition, subtraction, scalar multiplication, and multiplication. 


Determine whether the product of two given matrices is defined. 


Compute matrix products using the row-column method, the column method, and the row method. 


Express the product of a matrix and a column vector as a linear combination of the columns of the matrix. 


Express a linear system as a matrix equation, and identify the coefficient matrix. 


Compute the transpose of a matrix. 


Compute the trace of a square matrix. 


Exercise Set 1.3 


1. Suppose that A, B, C, D, and E are matrices with the following sizes: 


A B C D E 
(4x5) (4x5) (5x2) (4x2) (5x4) 


In each part, determine whether the given matrix expression is defined. For those that are defined, give the size of 
the resulting matrix. 


(a) BA 

(b) AO +D 
(c) 42 +8 
(d) 48+8 

(ce) F(A+ 8) 
(f) E(AC) 

(g) B"4 

(h) (474+ 2)D 
Answer: 


(a) Undefined 
(b) 4x2 
(c) Undefined 
(d) Undefined 
(e) 3x5 
(f) 3x2 


(g) Undefined 
(h) 9x2 


2. Suppose that A, B, C, D, and EF are matrices with the following sizes: 


Oo 


A B C D E 
(3x1) (36) (6x2) (2x6) (1x3) 


In each part, determine whether the given matrix expression is defined. For those that are defined, give the size of 
the resulting matrix. 


(a) EA 

(b) 4p" 

(c) BTA + z) 
(d) 2A+C 

(c) (c7+ D\B? 
f) cp+Bizt 
(g) (ep7\c7 
(hy) DO + 2A 


. Consider the matrices 


30 152 613 
A= -1 2}, a=[¢ ze o=[3 7 st p=|-101|, #=|-1 12 
11 324 413 


In each part, compute the given expression (where possible). 
(a) D+# 

(b) D-# 

(c) 5A 

(d) -?7C 

(e) 2B-C 

(f) 44-—2D 

(g) —3(2 + 22) 
(h) A—A 

(i) tr(D) 

(j) r(D— 32) 
(k) 4 tr(7B) 

(1) tr(A) 


Answer: 


(a) 


(b) | = 


«) [15 0 


=—5 10 
2: 22 
(d)| —? =—28 —14 
=—21 <7 =35 
(ce) Undefined 
(f) | 22 =—6 8 
—2 46 
100—ClO 4 
(g) | —39 =—21 =—24 
9 =6 =15 
=—33 =12 =—30 
(h) | 9 O 
0 0 
0 0 
(i) 5 
G) 29 
(k) 168 


(1) Undefined 

. Using the matrices in Exercise 3, in each part compute the given expression (where possible). 
(a) 247 +¢ 

(b) D7 - BT 

() (D=#)? 

(@) BP 45c7 

© de7-4 

() B-2B? 

(g) 287 -3p7 

(ae? - 3D") J 


(i) (CD)E 
G) C(BA) 
(K) tr(DE’) 
(I) tr(BC) 

. Using the matrices in Exercise 3, in each part compute the given expression (where possible). 
(a) AB 
(b) BA 
(c) GE)D 
(d) (AB)C 
(e) A(BC) 
() cct 


(g) (pA)! 
(h) (c "B\at 


@) tr(DD") 
(j) tr (4z7 —D) 
(k) (CPA? + 287) 


O) v((ac7} ‘4| 


Answer: 
(a) | 12 =—3 
—4 5 
4 1 


(b) Undefined 
(c) |42 108 75 


12 =—3 21 
36 «78 63 
“hn 45 9 
11 =11 17 
7 17 13 
(e) | 3 45 9 
11 =11 17 
7 #17 13 
(f) i a 
17 35 
(g) 0 =—2 A 
12 A, = 
(hy [12 6 9 
48 —20 14 
24 «8 16 
(i) 61 
(j) 35 
(k) 28 
(1) 99 


. Using the matrices in Exercise 3, in each part compute the given expression (where possible). 
(a) (2D7 ENA 
(b) (4B)C + 28 
(c) (=AC)? 4 5D7 


(4) (247 = 2c} f 


(e) BT(cc?— a7 A) 

() Diz? —(apy? 

. Let 

25 


3 7 6 
A=|6 4| and B=]0 
0 9 7 


uh 


Use the row method or column method (as appropriate) to find 
(a) the first row of AB. 

(b) the third row of AB. 

(c) the second column of AB. 

(d) the first column of BA. 

(e) the third row of AA. 

(f) the third column of AA. 


Answer: 
(a) [67 4141] 
(b) [63 67 57] 
(c) | 41 

21 

67 
(d) | 6 

6 

63 
(e) [24 56 97] 
(f) | 76 

98 

97 


. Referring to the matrices in Exercise 7, use the row method or column method (as appropriate) to find 
(a) the first column of AB. 
(b) the third column of BB. 
(c) the second row of BB. 
(d) the first column of AA. 
(e) the third column of AB. 
(f) the first row of BA. 


. Referring to the matrices A and B in Exercise 7, and Example 9, 
(a) express each column vectorof AA as a linear combination of the column vectors of A. 


(b) express each column vector of BB as a linear combination of the column vectors of B. 


Answer: 


0 4 9 97 0 4 9 
6 =—2 4 38 6 =—2 4 
3 =—2/0/+] 1)+7/)3]; |18)=4)0;/+3) 1)/4+5)3 
28 7 e a 74 7 q 5 


(a) | —3 3 =—2 7 76 3 —2 7 
sa|=2]6] +6) : ==—2/6/+5) 5/+4/4]; |98)/=7/6)+4) 5)4+9)4 
(b) | 64 6 4 
21 |=6/0)+7| 3); 
TF? 7 5 
10. Referring to the matrices A and B in Exercise 7, and Example 9, 
(a) express each column vector of AB as a linear combination of the column vectors of A. 
(b) express each column vector of BA as a linear combination of the column vectors of B. 
11. In each part, find matrices A, x, and b that express the given system of linear equations as a single matrix equation 
Ax =b, and write out this matrix equation. 
(a) 2x, —3xg+5x%3 = 7 


9xy— x2+ x3=—1 

x1 +5xg+4x3 = 0 
(b) 4x1 —3x3+ x4=1 

5x1 + x2 —8x4=3 


2x, —5x9+9x3— x4=0 
3x9 — x34 7xq4=2 


(b) 


1 

eS 

—5 9 =1]/%*3 0 
3 =1 7FI/*4 2 


12. In each part, find matrices A, x, and b that express the given system of linear equations as a single matrix equation 
Ax =b, and write out this matrix equation. 


(a) *1 —2x2+3x3= —3 


2x1 + X32 = 0 
—3x2+4%3 = 1 

x4 + x3= 5 

(b) 3x1 + 3x24 3x3= —3 
=x, —5x3-2%3 = 3 


—4x3+ x3 = O 


13. In each part, express the matrix equation as a system of linear equations. 


ay [ 5 6 —7)fai] [2 
-1 -2 3]/x2/=/0 


O 4 =—1]/*3 3 


(b) } 1 1 1 Bl 2 
2 3 Oj}; *%2}/=] 2 
5 =3 =—6 || %3 =9 
Answer: 
(a) 5x1 + 6x2 — 7x3 = 2 
=x, = 2x2 + 3x3 = 0 
4x9 = 7x3 = 3 
(bt) 41 + *2 + *3 = 2 
2x, + 3x3 = 32 
5xy =— 3x9 = 6x3 = <9 


14. In each part, express the matrix equation as a system of linear equations. 


(a) 3 =—1 2)[%1 2 
4 37 =| —1 
—2 1 5{|*3 4 
(b) 3 =—2 0 I1\fw 0 
5 02 =2]/x]| [0 
cs ee ees | a 
—2 51 = &46{L? 0 


In Exercises 15—16, find all values of k, if any, that satisfy the equation. 


15. 11 OWk 
[k 1 1]]/1 0 2}}1)=0 
02 =—3]/ 1 
Answer: 
—1 
16. 12 O}2 
[2 2 k]]2 0 3]/2)/=0 
0 3 lie 


In Exercises 17-18, solve the matrix equation for a, b, c, and d. 


17.| 2 oe. leet, 2m d= 2c 
—1 a+é d+-2e —=—2 
Answer: 
a=4, b= -6,¢c=-1,d=1 
18.) a¢-b b+a] |8 1 
3d+c 2d—c| |7 6 
19. Let A be any jz; x % matrix and let 0 be the jy x » matrix each of whose entries is zero. Show that if i_4 — 4, then 
&k&—Qor4A=—d@. 


20. (a) Show that if AB and BA are both defined, then AB and BA are square matrices. 
(b) Show that if A is an jy; sx » matrix and A(BA) is defined, then B is an » s¢ 7, matrix. 


21. Prove: If A and B are » x » matrices, then trCA + 8) = trCA) + tr(8). 
22. (a) Show that if A has a row of zeros and B is any matrix for which AB is defined, then AB also has a row of 
Zeros. 
(b) Find a similar result involving a column of zeros. 


23. In each part, find a § x 6 matrix [a,j] that satisfies the stated condition. Make your answers as general as possible 
by using letters rather than specific numbers for the nonzero entries. 


(a) aj=O0 f t#/ 
(b) ay=90 f i>/ 
(c) 4j=0 f t<j 
(d) @jj=0 ff f-yl|>1 


Answer: 

(a) }ay, O OF O80 OO O 
0 apy O O OD DO 
0 0 az 0 O O 
O06 60 aay 0 O 
Oo 60 0 ass O 
Oo 60 0 0 «a6 

(b) [411 212 213 @14 @15 a16 


432 @23 424 @25 426 
0 433 a34q a35 236 
0 O agg @45 B46 
0 O a55 a5¢ 
0 0 0 te 


(Cc) }ayy 0 08 0 


oo oO fo O&O 


bs 


@61 462 463 464 465 6 


(d) }ayy ay2 9 OO ODO 


Oo 2 


0 O @54 a55 @56 
0 O O «@gs aE 


24. Find the 4 x 4 matrix A= [a;;] whose entries satisfy the stated condition. 
(a) yy =F + 
(6) ajy= ie 


() {1 poy>t 
UY 1 if pays 


25. Consider the function y = f (x) defined for 2 % | matrices x by y = Ax, where 


alo 


Plot f(x) together with x in each case below. How would you describe the action of f? 


Answer: 
Xp) [Xp + x2 
tta)= ("a2") 


Alt 


26. Let I be the » x matrix whose entry in row 7 and column ; is 
1 f i=; 
Of i#/ 


27. How many 3 s< 3 matrices A can you find such that 


Show that 47 = 7.4 = A for every » x matrix A. 


for all choices of x, y, and z? 


Answer: 
1 1 £0 
One; namely, A=|1 —1 0 
0 0 0 


28. How many 3 s< 3 matrices A can you find such that 


for all choices of x, y, and z? 


29. A matrix B is said to be a square root of a matrix A if BB — A. 


(4) Find two square roots of A = : | 


0) How many different square roots can you find of A= k P 


(c) Do you think that every 2 x 2 matrix has at least one square root? Explain your reasoning. 


Answer: 


(a) | 1 1 4 -1 =-1 
|; 0 feed (on aes 
0) our /¥5 | |-¥5 | |¥5 9 | |-¥5 0 
‘lo 3 e 3) |o <3),| O = 
30. Let 0 denote a 2 sx 2 matrix, each of whose entries is zero. 


(a) Is there a 2 x 2 matrix A such that 4 4 0 and 44 — (0)? Justify your answer. 
(b) Is there a 2 x 2 matrix A such that 4+ 0 and 44 — 4? Justify your answer. 


True-False Exercises 


In parts (a)-(o) determine whether the statement is true or false, and justify your answer. 
@) The matrix ; : ‘| has no main diagonal. 


Answer: 


True 


b) An matrix has m column vectors and n row vectors. 
XH 
Answer: 


False 


(c) If A and B are 2 x 2 matrices, then 42 — BA. 
Answer: 


False 


(d) The i th row vector of a matrix product AB can be computed by multiplying A by the 7th row vector of B. 
Answer: 


False 


r 
(e) For every matrix A, it is true that (47) = A. 


Answer: 


True 
(f) If A and B are square matrices of the same order, then tr(.A) = tr(Ajtr(3). 


Answer: 


False 


(g) If A and B are square matrices of the same order, then (43) Pa Alp. 


Answer: 
False 
(h) : a A i A 
For every square matrix A, it is true that tr tr(A), 
Answer: 
True 
Tyr 


@IfAisa 6 x 4 matrix and B is an j9) 5 y matrix such that B iS a2 sx 6 matrix, then j», — 4 andy, — 32. 


Answer: 


True 


(j) If A is an » s¢ 2 matrix and c is a scalar, then tr(¢.A) =c tr(A). 
Answer: 


True 


(k) If A, B, and C are matrices of the same size such that 4— ¢'— ® — 7, then 4 — #. 
Answer: 


True 


(I) If A, B, and C are square matrices of the same order such that 4c — Bc’, then 4 — #. 
Answer: 


False 


(m) If 48 + 2A is defined, then A and B are square matrices of the same size. 
Answer: 


True 


(n) If B has a column of zeros, then so does AB if this product is defined. 
Answer: 


True 


(o) If B has a column of zeros, then so does BA if this product is defined. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


1.4 Inverses; Algebraic Properties of Matrices 


In this section we will discuss some of the algebraic properties of matrix operations. We will see that many of 
the basic rules of arithmetic for real numbers hold for matrices, but we will also see that some do not. 


Properties of Matrix Addition and Scalar Multiplication 


The following theorem lists the basic algebraic properties of the matrix operations. 


THEOREM 1.4.1 Properties of Matrix Arithmetic 


Assuming that the sizes of the matrices are such that the indicated operations can be performed, the 
following rules of matrix arithmetic are valid. 

(a) A+B=8+A (Commutative law for addition) 

(b) A+ (B4+C)=(A4+3)4+E (Associative law for addition) 
(c) AVBC) = (ABC (Associative law for multiplication) 

(d) AB+C) =AB+ AC (Left distributive law) 

(e:) (B+C)A=8A+CA (Right distributive law) 

f@ AB-C)=AB-AC 

(g) (B-C)A=BA-CA 

(h) (B+C)=ab+al 

(i) a8-C)=aB-al 

ij) (@+8)C=aC+d0 

(k) (@—-8)C =a —dC 

() a(aC) = (ab)C 

(m) 2(8C) = (@B)C = B(al) 


To prove any of the equalities in this theorem we must show that the matrix on the left side has the same size 
as that on the right and that the corresponding entries on the two sides are the same. Most of the proofs follow 
the same pattern, so we will prove part (d) as a sample. The proof of the associative law for multiplication is 
more complicated than the rest and is outlined in the exercises. 


There are three basic ways to prove that two 
matrices of the same size are equal—prove that 
corresponding entries are the same, prove that 
corresponding row vectors are the same, or 
prove that corresponding column vectors are 
the same. 


Proof (d) We must show that A(3 + C) and 43 + AC have the same size and that corresponding entries 
are equal. To form A(3 + C’), the matrices B and C must have the same size, say » « », and the matrix A 
must then have m columns, so its size must be of the form > 5¢ j;. This makes A(B + C’) an p x » matrix. It 
follows that AB 4 AC is also an p x »% matrix and, consequently, A(B + C) and AB + AC have the same size. 


Suppose that A = [a@;;], 8 = [4;;],and C = [c,;]. We want to show that corresponding entries of 
A(B + C’) and AB + AC are equal; that is, 
[A(B+ C)]iy = [AB +. AC] y 
for all values of i andj. But from the definitions of matrix addition and matrix multiplication, we have 
[A(iB+ C)]iy = ay (by; 7 e413) + aj7(b3; — £23) 5 ls 3 Bim (Pm; + omy) 
(4154; t a;7b2; Pr 2 Bim? mj) + (ayic1; F ay2e23 + ° + * + AimC mj) 
[4B] ij t [AC] y= [AB + AC] i} 


Remark Although the operations of matrix addition and matrix multiplication were defined for pairs of 
matrices, associative laws (b) and (c) enable us to denote sums and products of three matrices as 4 + B+ © 
and ABC without inserting any parentheses. This is justified by the fact that no matter how parentheses are 
inserted, the associative laws guarantee that the same end result will be obtained. In general, given any sum or 
any product of matrices, pairs of parentheses can be inserted or deleted anywhere within the expression 
without affecting the end result. 


EXAMPLE 1 Associativity of Matrix Multiplication << 


As an illustration of the associative law for matrix multiplication, consider 


12 
A=|3 4|, s-(3 a c=|; | 


01 2 1 2 3 
Then 
1 2 8 5 
4 3 4 31/1 0 10 $3 
AB=|3 4|| |- 20 13) and ac =| | \-| 
01 2 1 > | 2 1]/2 3 4 3 
Thus 
8 5 10 18 15 
(AB\C =] 20 13 E | 46 39 
2 1 4 3 
and 
12 18 15 
A(BC) =|3 4 i 3|- 46 39 
0 1 4 3 


so (AS)C = A(8C), as guaranteed by Theorem 1.4.1(c). 


Properties of Matrix Multiplication 


Do not let Theorem 1.4.1 lull you into believing that a// laws of real arithmetic carry over to matrix 
arithmetic. For example, you know that in real arithmetic it is always true that 74 = dg, which is called the 
commutative law for multiplication. In matrix arithmetic, however, the equality of AB and BA can fail for 
three possible reasons: 


1. AB may be defined and BA may not (for example, if A is 2 % 3 and B is 3 x 4). 

2. AB and BA may both be defined, but they may have different sizes (for example, if A is 2 % 3 and B is 
3 x 2). 

3. AB and BA may both be defined and have the same size, but the two matrices may be different (as 
illustrated in the next example). 


Do not read too much into Example 2—it does 
not rule out the possibility that AB and BA may 
be equal in certain cases, just that they are not 
equal in all cases. If it so happens that 

AB — BA, then we say that AB and BA 
commute. 


EXAMPLE 2 Order Matters in Matrix Multiplication << 


Consider the matrices 
Multiplying gives 


Thus, AB + BA. 


Zero Matrices 


A matrix whose entries are all zero is called a zero matrix. Some examples are 
f 1 : : ; E 0 0 1 
0 0 000 000 0 


We will denote a zero matrix by 0 unless it is important to specify its size, in which case we will denote the 
ya x 4 Zero matrix by Oy». 


oO 


o oOo & 


It should be evident that if A and 0 are matrices with the same size, then 

A+0=04+A=A 
Thus, 0 play s the same role in this matrix equation that the number 0 plays in the numerical equation 
a+0=0+a=a. 


The following theorem lists the basic properties of zero matrices. Since the results should be self-evident, we 
will omit the formal proofs. 


THEOREM 1.4.2 Properties of Zero Matrices 


If cis a scalar, and if the sizes of the matrices are such that the operations can be perfomed, then: 


(a) A+O0=0+4+A=A 


(b) A-O=A 
(:) A-A=A+ (—A) =0 
(d) OA=0 


(e) Ife-A=—0,theng —Oor4—¢. 


Since we know that the commutative law of real arithmetic is not valid in matrix arithmetic, it should not be 
surprising that there are other rules that fail as well. For example, consider the following two laws of real 
arithmetic: 


° If gh = de and g <0, then 4 =r. [The cancellation law] 
° If g+ = 0, then at least one of the factors on the left is 0. 


The next two examples show that these laws are not universally true in matrix arithmetic. 


EXAMPLE 3 Failure of the CancellationLaw 


[0 2} aL a} or [5 @ 


We leave it for you to confirm that 
AS = AC = : | 


6 8 


Although 4 + @, canceling 4 from both sides of the equation 43 — 4‘ would lead to the 
incorrect conclusion that  — ¢*. Thus, the cancellation law does not hold, in general, for matrix 
multiplication. 


Consider the matrices 


EXAMPLE 4 A Zero Product with Nonzero Factors 


Here are two matrices for which 43 — Q, but 4 gand Bs (0: 


fib eB 


Identity Matrices 


A square matrix with 1's on the main diagonal and zeros elsewhere is called an identity matrix. Some 
examples are 


100 0 
[1] 1 0 ae 010 0 
LO 1 Jog gf foo 1 9 
0001 


An identity matrix is denoted by the letter /. If it is important to emphasize the size, we will write /,, for the 
» x» identity matrix. 


To explain the role of identity matrices in matrix arithmetic, let us consider the effect of multiplying a general 
2 x 3 matrix A on each side by an identity matrix. Multiplying on the right by the 3 ~ 3 identity matrix yields 


10 0 
ay, a2 443 ayy a2 443 
A= | a2 | : : : =|on a2 oa [74 


and multiplying on the left by the 2 x 2 identity matrix yields 
1 O}} 411 @12 @13 411 @12 13 
: E le an Z| e an aa 
The same result holds in general; that is, if A is any 2; x 92 matrix, then 


Al,=A and ImA=A 


Thus, the identity matrices play the same role in these matrix equations that the number | plays in the 
numerical equation g- ]—=1-g—=q. 


As the next theorem shows, identity matrices arise naturally in studying reduced row echelon forms of square 
matrices. 


THEOREM 1.4.3 


If R is the reduced row echelon form of an » x % matrix A, then either R has a row of zeros or R is the 
identity matrix J). 


Proof Suppose that the reduced row echelon form of 4 is 


Pil 712 ""* * #* iy 

ra rap oc oe 
Aa al 22 = 

Phyl ?y2 *" * Pym 


Either the last row in this matrix consists entirely of zeros or it does not. If not, the matrix contains no zero 
rows, and consequently each of the 1 rows has a leading entry of 1. Since these leading 1's occur 
progressively farther to the right as we move down the matrix, each of these 1's must occur on the main 
diagonal. Since the other entries in the same column as one of these 1's are zero, R must be /,. Thus, either R 
has a row of zeros or R= Jy. 


Inverse of a Matrix 


In real arithmetic every nonzero number a has a reciprocal a Nt = 1/ a) with the property 


1 


a+a =a ‘a=1 


The number ~! is sometimes called the multiplicative inverse of a. Our next objective is to develop an 


analog of this result for matrix arithmetic. For this purpose we make the following definition. 


DEFINITION 1 


If A is a square matrix, and if a matrix B of the same size can be found such that 48 — 2.4 — j, then A 
is said to be invertible (or nonsingular) and B is called an inverse of A. If no such matrix B can be 
found, then A is said to be singular. 


Remark The relationship 43 — 2.4 — / is not changed by interchanging A and B, so if A is invertible and B 
is an inverse of A, then it is also true that B is invertible, and A is an inverse of B. Thus, when 


AB=B8A=i1 


we say that A and B are inverses of one another. 


EXAMPLE 5 An Invertible Matrix << 


Let 


Then 


2 (2-3 J-[ 
m= [22 ]-[ 


Thus, A and & are invertible and each is an inverse of the other. 


EXAMPLE 6 Class of Singular Matrices << 


In general, a square matrix with a row or column of zeros is singular. To help understand why 


this is so, consider the matrix 


1 
A=|2 


0 
0 
3 6 0 


To prove that A is singular we must show that there is no 3 x 3 matrix B such that 48 — BA —} 
. For this purpose let cj, 3, 0 be the column vectors of A. Thus, for any 3 x 3 matrix B we 


can express the product BA as 


BA=B[e, cz 0] =[Bey Bez 0] [ Formula (6) of Section 13] 


The column of zeros shows that 3.4 « j and hence that A is singular. 


Properties of Inverses 


It is reasonable to ask whether an invertible matrix can have more than one inverse. The next theorem shows 


that the answer is no—an invertible matrix has exactly one inverse. 


THEOREM 1.4.4 


If B and C are both inverses of the matrix A, then ® — (7. 


Proof Since B is an inverse of A, we have 3.4 — /. Multiplying both sides on the right by C gives 
(BAC = /C = C. But it is also true that (BA)C = BCAC) = BJ = By soc = BP. 


As a consequence of this important result, we can now speak of “the” inverse of an invertible matrix. If A is 


invertible, then its inverse will be denoted by the symbol 4~!. Thus, 


AA*=7 and ATA=I 


(1) 


The inverse of A plays much the same role in matrix arithmetic that the reciprocal ,—! plays in the numerical 
relationships gq~! — j and gg = 1. 


In the next section we will develop a method for computing the inverse of an invertible matrix of any size. 
For now we give the following theorem that specifies conditions under which a 2 x 2 matrix is invertible and 
provides a simple formula for its inverse. 


THEOREM 1.4.5 


The matrix 


[3 


is invertible if and only if g@ — de « O, in which case the inverse is given by the formula 


—1_ 1 ad =) 
Bee ea 2) 


We will omit the proof, because we will study a more general version of this theorem later. For now, you 
should at least confirm the validity of Formula 2 by showing that 447! — 4-14 —7. 


Historical Note The formula for 4~! given in Theorem 1.4.5 first appeared (in a more general 


form) in Arthur Cayley's 1858 Memoir on the Theory of Matrices. The more general result that 
Cayley discovered will be studied later. 


The quantity gf — be in Theorem 1.4.5 is 
called the determinant of the 2 x 2 matrix A 
and is denoted by 

det(.A) = ad — be 
or alternatively by 


a b 
cea 


[=a — be 


Remark Figure 1.4.1 illustrates that the determinant of a 2 sx 2 matrix A is the product of the entries on its 
main diagonal minus the product of the entries off its main diagonal. In words, Theorem 1.4.5 states that a 

2 x 2 matrix A is invertible if and only if its determinant is nonzero, and if invertible, then its inverse can be 
obtained by interchanging its diagonal entries, reversing the signs of its off-diagonal entries, and multiplying 
the entries by the reciprocal of the determinant of A. 


| Ne Bi 
det(A) = S< | =ad—bc 
of “d 


‘. 


Figure 1.4.1 


EXAMPLE 7 Calculating the Inverse of a2 x 2 Matrix <@ 


In each part, determine whether the matrix is invertible. If so, find its inverse. 


(a) ,_|6 1 
-|5 


(b) ,_[-1 2 
er 


Solution 


(a) The determinant of A is det(.4A) = (6)(2) = (1) (5) = 7, which is nonzero. Thus, A is 
invertible, and its inverse is 


SA ~~) 


ts 
AR 
| 
~J)o 
| rs | 
| 
LA 
| 
Ae 
_ a | 
I 
—Jjin ~J(Do 


We leave it for you to confirm that 44~! — 4-14 =—/. 
(b) The matrix is not invertible since det(.4) = (=—1)(=—6) = (2)(3) =0. 


EXAMPLE 8 Solution of a Linear System by Matrix Inversion 


A problem that arises in many applications is to solve a pair of equations of the form 

“ax + dy 

v=cx+dy 
for x and y in terms of u and v. One approach is to treat this as a linear system of two equations in the 
unknowns x and y and use Gauss—Jordan elimination to solve for x and y. However, because the 


coefficients of the unknowns are /iteral rather than numerical, this procedure is a little clumsy. As an 
alternative approach, let us replace the two equations by the single matrix equation 


apes 
PI-[E lb 


If we assume that the 2 » 2 matrix is invertible (i.¢., g7 — be « 0), then we can multiply through on 
the left by the inverse and rewrite the equation as 


which we can rewrite as 


which simplifies to 
ab] Tu _ [x 
ca vi oly 
Using Theorem 1.4.5, we can rewrite this equation as 


ae |< “ab]-b! 


x 


from which we obtain 


The next theorem is concerned with inverses of matrix products. 


THEOREM 1.4.6 


If A and B are invertible matrices with the same size, then AB is invertible and 


(4B)! =a 471 


Proof We can establish the invertibility and obtain the stated formula at the same time by showing that 


(AB) (ata = (BAB) =f 
But 
(AB) (ata = A(ea | =AIA? = AA =] 


and similarly, (2 a ie )(4B) =f, 


Although we will not prove it, this result can be extended to three or more factors: 


A product of any number ofinvertible matrices is invertible, and the inverse of the product is the 
product of the inverses in the reverse order. 


EXAMPLE 9 The Inverse of a Product 


Consider the matrices 


We leave it for you to show that 


A —s 
2 6 = 
a=|) : (ABy"=|_9 7 
2 2 
and also that 
1 =! -1 4 3 
7 2 ee) |code? eer 


Thus, (AB) —l_ p-l 4 as guaranteed by Theorem 1.4.6. 


Powers of a Matrix 


If A is a square matrix, then we define the nonnegative integer powers of A to be 
A° =] and A"=AA-+--A [# factors ] 
and if A is invertible, then we define the negative integer powers of A to be 


ba] 


Avt= (47 =-Atgt...4t [z factors | 


Because these definitions parallel those for real numbers, the usual laws of nonnegative exponents hold; for 


example, 


A A= Arts and (a”yP =a" 


If a product of matrices is singular, then at least 
one of the factors must be singular. Why? 


In addition, we have the following properties of negative exponents. 
THEOREM 1.4.7 


If A is invertible and n is a nonnegative integer, then: 


-l 
(a) A7 is invertible and (4 a = A. 


(b) 4" is invertible and (any =—- A” = (4 ee 2 


(c) kA is invertible for any nonzero scalar k, and (*.A) Taga 


We will prove part (c) and leave the proofs of parts (a) and (b) as exercises. 


Proof (c) Properties (c) and (m) in Theorem 1.4.1 imply that 


(KA) (e"A7} =k (kaya = ("kaa =(irai 


and similarly, (k At = (KA) =J. Thus, KA is invertible and (x4) togtyaTt 


EXAMPLE 10 Properties of Exponents << 


Let A and Aa be the matrices in Example 9; that is, 
_}1 2 1 3 —2 
a=|; | ane -|_3 ] 
_ _1\3 3 =—2 3 =—2 3 =—2 41 —30 
At={at) = = 
( & leg ‘| A ee 4 
Pe 1 21/1 2]/1 2 _ 11 30 
1 3]/1 34/1 3 15 41 


so, as expected from Theorem 1.4.7(d), 


a 1 41 -30]_[ 41 -30]_ 7,13 
(") = Gyan Goa Ee |= | -15 |= (4 } 


Then 


Also, 


EXAMPLE 11 The Square of a Matrix Sum <@ 


In real arithmetic, where we have a commutative law for multiplication, we can write 
(a+b)? =a? 4ab4+ ba +b? =a? +ab+ab4+b? =a? + 2ab +b? 


However, in matrix arithmetic, where we have no commutative law for multiplication, the best 
we can do is to write 


(A+B)? = A* + AB+ BA+ 3B? 


It is only in the special case where A and B commute (1.e., 48 — 3A) that we can go a step 
further and write 


(A+ 8)? = A? 4+ 248 + 8? 


Matrix Polynomials 


If A is a square matrix, say 4 x », and if 
p(x) =ag +ayx tax? + ++ + tax” 


is any polynomial, then we define the » x »z matrix p(A) to be 
P(A) =agl | ayA+ az A? 4 see ial 3) 


where J is the »z x identity matrix; that is, p(A) is obtained by substituting A for x and replacing the constant 
term @g by the matrix @g/. An expression of form 3 is called a matrix polynomial in A. 


EXAMPLE 12 AMatrix Polynomial 
Find p(A) for 


p(x) =x? —2x —3 and a=|7} | 


Solution 
p(A) = A*—24—37 


or more briefly, p(.A) = 0. 


Remark It follows from the fact that 4”,4° — 4*t5 — 45+" — 45 4” that powers of a square matrix 
commute, and since a matrix polynomial in A is built up from powers of A, any two matrix polynomials in A 
also commute; that is, for any polynomials p; and p2 we have 


P1(A)p2(A) = 2A) 1 (4) (4) 


Properties of the Transpose 


The following theorem lists the main properties of the transpose. 


THEOREM 1.4.8 


If the sizes of the matrices are such that the stated operations can be performed, then: 
@ (a7) "=A 

() (A+B)P=alyat 

() (A—B)? =al—alt 

(d) (kA)? =k? 

() (AB)? =B7 at 


If you keep in mind that transposing a matrix interchanges its rows and columns, then you should have little 
trouble visualizing the results in parts (a)-(d). For example, part (a) states the obvious fact that interchanging 
rows and columns twice leaves a matrix unchanged; and part (5) states that adding two matrices and then 
interchanging the rows and columns produces the same result as interchanging the rows and columns before 
adding. We will omit the formal proofs. Part (e) is a less obvious, but for brevity we will omit its proof as 
well. The result in that part can be extended to three or more factors and restated as: 


The transpose of a product of any number of matrices is the product of the transposes in the reverse 
order. 


The following theorem establishes a relationship between the inverse of a matrix and the inverse of its 
transpose. 


THEOREM 1.4.9 


If A is an invertible matrix, then A’ is also invertible and 


y= (at) 


Proof We can establish the invertibility and obtain the formula at the same time by showing that 


AT(at)' = (at) a? = 


But from part (e) of Theorem 1.4.8 and the fact that 77 — 7, we have 


which completes the proof. 


EXAMPLE 13 Inverse ofaTranspose 


Consider a general 3 x 2 invertible matrix and its transpose: 
a b 7 [@ec 
A= and A* = 
° ;| F ad 
Since A is invertible, its determinant gf — de is nonzero. But the determinant of A’ is also 


ad — be (verify), so A’ is also invertible. It follows from Theorem 1.4.5 that 


a aE 

of d—be ad —be 
(a7) =|“ 

eee ees eee 

~ ad —be ad —be 


which is the same matrix that results if ,4—! is transposed (verify). Thus, 


ey =e) 


as guaranteed by Theorem 1.4.9. 


Concept Review 


Commutative law for matrix addition 


Associative law for matrix addition 


Associative law for matrix multiplication 


Left and right distributive laws 


Zero matrix 


Identity matrix 


Inverse of a matrix 


Invertible matrix 


Nonsingular matrix 


Singular matrix 


Determinant 


Power of a matrix 


e Matrix polynomial 


Skills 


° Know the arithmetic properties of matrix operations. 


Be able to prove arithmetic properties of matrices. 


Know the properties of zero matrices. 


Know the properties of identity matrices. 


Be able to recognize when two square matrices are inverses of each other. 


Be able to determine whether a 2 5 2 matrix is invertible. 


Be able to solve a linear system of two equations in two unknowns whose coefficient matrix is 
invertible. 


Be able to prove basic properties involving invertible matrices. 


Know the properties of the matrix transpose and its relationship with invertible matrices. 


Exercise Set 1.4 


1. Let 
2—1 3 8 —3 —5 0 —2 3 
A= if) 4 5 B=|0 1 2 e=)1 7 4 a=4, b=-=+ 
—2 14 4 —7 6 3 5 9 
Show that 


(a) 4+ (84+C0) = (A4+5)4+C 
(b) (AB)C = A(BC) 
(c) @+4)C=a0+aC 
(d) @(8-—C) =aB-al 
2. Using the matrices and scalars in Exercise 1, verify that 
(a) a(8C) = (aB)C = Bal) 
(b) A(B=|C) = AB= AC 
(c) (B+ C)A=8A+CA 
(d) a(hC) = (ab)C 


3. Using the matrices and scalars in Exercise 1, verify that 
og Fr _ — 
(b+) (A+B)? =A? 4a? 
() (ac)? =ac? 
(d) (AB)? = Bla? 


In Exercises 4—7 use Theorem 1.4.5 to compute the inverses of the following matrices. 


Answer: 


po = 


2S bPle 


8. Find the inverse of 


9. Find the inverse of 


—l 
10. Use the matrix A in Exercise 4 to verify that A ‘| = 


11 


cos @ 
=sin f 


(7 +e") 


(re) 


—l 
* Use the matrix B in Exercise 5 to verify that (2 *) = 


sin # 
cos @ 


(*-e*) 


hl pole 


(Tbe) 


ary 
(at)" 


12. Use the matrices A and B in 4 and 5 to verify that (43) opty 


13. Use the matrices A, B, and C in Exercises 4-6 to verify that (ABC) T=C BAS. 


In Exercises 14-17, use the given information to find A. 


14. ,-1 2 =1 
Pe 


15. aay =|! 4 


1 =—2 

Answer 

2 

= |i 

7 
A= 

a. 

7 7 

—1 _ = 

* earyte[3 7 


Answer 
=e eee 
13 13 
2 _6 
13 13 


18. Let A be the matrix 


In each part, compute the given quantity. 
(a) 4? 

(b) A> 

(©) A? -2A4/ 

(d) pCA), where p(x) =x =—2 

(ec) p(A), where p(x) = Qxt mx 
(f) p(A), where p(x) =x? — 2x +4 


19. Repeat Exercise 18 for the matrix 
ie | 
A= 
2 | 


Answer: 


20. 


2 


— 


(f) [39 13 
26 13 


Repeat Exercise 18 for the matrix 


. Repeat Exercise 18 for the matrix 


0 0 


0.026 0.018 
—0.018 0.026 


0 0 

=—5 =12 

—5 

0 60 
—3 3 
—3 =3 

16 0 0 

0 =—14 =15 

0 15 =—14 


oor coo 
pt 
NM 


() [25 0 0 
0 32 -24 
0 24 32 


In Exercises 22-24, let p(x) = xt 9, po(x) =x + 3, and p3(x) =x — 3. Show that 
Pp {(A) = p2(A)p3(A) for the given matrix. 


22. The matrix A in Exercise 18. 
23. The matrix A in Exercise 21. 
24. An arbitrary square matrix A. 


25. Show that if p(x) =x? — (a +d)x + (ad — be) and 


-[E 


then p(A) = 0. 
26. Show that if p(x) =x? = (a+ +c)x7+ (ab+ ae + be —cd)x —a(be — cd) and 
a 00 
A=|0 de 
Od @ 
then p(A) = 0. 
27. Consider the matrix 
aj, O 0 
ro 0 222 0 
Oo 60 7, Bee 
where @11@33° * *@»y,, # 0. Show that A is invertible and find its inverse. 
Answer: 
See, 0 
a1 
9 0 
&22 
0 0 ! 


ayy 
28. Show that if a square matrix A satisfies A? —3A +i=0,then gtu3z7_ 4. 
29. (a) Show that a matrix with a row of zeros cannot have an inverse. 
(b) Show that a matrix with a column of zeros cannot have an inverse. 


30. Assuming that all matrices are »; x »; and invertible, solve for D. 


ABC’ DBA TC = AB? 


31. Assuming that all matrices are »; x », and invertible, solve for D. 
Cla ade pA aC ac" 
Answer: 
—l 
D=CAB1 4a? (27) A 
32. : : Z es a oN te nyt _ ff 4T\" ; 
If A is a square matrix and nis a positive integer, is it true that (A) = | A” } ? Justify your answer. 
33. Simplify: 
—l 
(AB)! (4c (pc) po 
Answer: 
B —l 
34. Simplify: 
eh aa re afi gal 
(4c (4c }(ac AD 
In Exercises 35-37, determine whether A is invertible, and if so, find the inverse. [Hint: Solve 4.¥ — j for X 
by equating corresponding entries on the two sides. ] 


35. 10 1 
A=|1 1 0 
011 


Answer: 

2 et eee 

2 2 2 

=I i le « ok 

a slr o. = @ 

ks ee 

2 2 2 
36. ket a 
A=/1 0 0 
011 
37. 00 1 
A= qe ae 
—1 11 


38. 


oN |e 
Oo NH | 


Prove Theorem 1.4.2. 


In Exercises 39—42, use the method of Example 8 to find the unique solution of the given linear system. 


39. 


40. 


41. 


42. 


43. 
44. 
45. 
46. 
47. 
48. 
49. 
50. 
51. 


52. 
53. 


3x, —2x2= —1 
4x,;+5x2 = 3 


Answer: 
eres 
=x, +5x2=4 
=x, —3x23=1 
6x13+ x2 = 0 


4x; —3x3= =—2 


Answer: 
Re ee Sen 
yar en eal 
2x, —2x27=4 
xy +4x2=4 
Prove part (a) of Theorem 1.4.1. 


Prove part (c) of Theorem 1.4.1. 

Prove part (f) of Theorem 1.4.1. 

Prove part (b) of Theorem 1.4.2. 

Prove part (c) of Theorem 1.4.2. 

Verify Formula 4 in the text by a direct calculation. 

Prove part (d) of Theorem 1.4.8. 

Prove part (e) of Theorem 1.4.8. 

(a) Show that if A is invertible and 48 — 4¢7, then P= @. 


(b) Explain why part (a) and Example 3 do not contradict one another. 
Show that if A is invertible and k is any nonzero scalar, then («.4)”" =k" A” for all integer values of n. 
(a) Show that if A, B, and 4 + # are invertible matrices with the same size, then 


A(a7 BRA p By =i 


(b) What does the result in part (a) tell you about the matrix 4—! . p—!? 


54. A square matrix A is said to be idempotent if 42 — 4. 
(a) Show that if A is idempotent, then so is 7 — 4. 


(b) Show that if A is idempotent, then 24 — } is invertible and is its own inverse. 


55. Show that if A is a square matrix such that 4" — 9 for some positive integer k, then the matrix A is 
invertible and 


(1=4)7 =I + A+ 424-2 pat 
True-False Exercises 
In parts (a)-(k) determine whether the statement is true or false, and justify your answer. 
(a) Two » x » matrices, A and B, are inverses of one another if and only if 48 = BA — 0. 
Answer: 


False 


(b) For all square matrices A and B of the same size, it is true that (4 +4 B)? = A? + 2AB + B?. 


Answer: 


False 


(c) For all square matrices A and B of the same size, it is true that At = B= (A—B)(A+ 8). 


Answer: 


False 


(d) If A and B are invertible matrices of the same size, then AB is invertible and (43) tg tpet, 


Answer: 


False 


(e) If A and B are matrices such that AB is defined, then it is true that (AB) P= Alpt. 


Answer: 


False 
(f) The matrix 


is invertible if and only if g7 — be + 0. 
Answer: 


True 


(g) If A and B are matrices of the same size and & is a constant, then (4A + 3) Pokal 487. 


Answer: 


True 


(h) If A is an invertible matrix, then so is 47. 
Answer: 


True 


i) If p(x) an bax baoxe + + + + + amex” and Jis an identity matrix, then 
p(x) =ag+ajx+anx° 4 tam 


PW) =agp tay +agt ++ + +ay. 
Answer: 


False 


(j) A square matrix containing a row or column of zeros cannot be invertible. 
Answer: 


True 


(k) The sum of two invertible matrices of the same size must be invertible. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


1.5 Elementary Matrices and a Method for Finding 
An 


In this section we will develop an algorithm for finding the inverse of a matrix, and we will discuss some of the 
basic properties of invertible matrices. 


In Section 1.1 we defined three elementary row operations on a matrix A: 
1. Multiply a row by a nonzero constant c. 

2. Interchange two rows. 

3. Add a constant c times one row to another. 


It should be evident that if we let B be the matrix that results from A by performing one of the operations in this 
list, then the matrix A can be recovered from B by performing the corresponding operation in the following list: 


1. Multiply the same row by 1I/c. 
2. Interchange the same two rows. 
3. If B resulted by adding c times row r; of A to row r2, then add —c times r to 72. 


It follows that if B is obtained from A by performing a sequence of elementary row operations, then there is a 
second sequence of elementary row operations, which when applied to B recovers A (Exercise 43). Accordingly, 
we make the following definition. 


DEFINITION 1 


Matrices 4 and B are said to be row equivalent if either (hence each) can be obtained from the other by 
a sequence of elementary row operations. 


Our next goal is to show how matrix multiplication can be used to carry out an elementary row operation. 


DEFINITION 2 


An » x » matrix is called an elementary matrix if it can be obtained from the » x » identity matrix /,, 
by performing a single elementary row operation. 


EXAMPLE 1 Elementary Matrices and Row Operations 


Listed below are four elementary matrices and the operations that produce them. 


Multiply the Interchange the Add? times the Multiply the 


second row of second and fourth third row of first row of 
Jy by = 3. rows of J4. £3 to the firstrow. J/g by 1. 


The following theorem, whose proof is left as an exercises, shows that when a matrix A is multiplied on the /eft 
by an elementary matrix £, the effect is to perform an elementary row operation on A. 


THEOREM 1.5.1. Row Operations by Matrix Multiplication 


If the elementary matrix E results from performing a certain row operation on Jj, and if A is an j3 x » 
matrix, then the product FA is the matrix that results when this same row operation is performed on A. 


EXAMPLE 2 Using Elementary Matrices << 


Consider the matrix 


and consider the elementary matrix 
10 0 
£=|0 1 0 
30 4 


which results from adding 3 times the first row of /3 to the third row. The product EA is 


1 © 23 
FA=|/2 —1 3 6 
4 4 10 9 
which is precisely the same matrix that results when we add 3 times the first row of A to the third 


TOw. 


Theorem 1.5.1 will be a useful tool for 
developing new results about matrices, 
but as a practical matter it is usually 
preferable to perform row operations 
directly. 


We know from the discussion at the beginning of this section that if E is an elementary matrix that results from 
performing an elementary row operation on an identity matrix J, then there is a second elementary row 
operation, which when applied to E, produces / back again. Table 1 lists these operations. The operations on the 
right side of the table are called the inverse operations of the corresponding operations on the left. 


Table 1 


Row Operation on J That Produces E Row Operation on E That Reproduces J 
Multiply row i by ¢ + 0 Multiply row i by 1/c 
Interchange rows i and j Interchange rows i and j 


Add c times row i to row j 


Add —c times row i to row j 


EXAMPLE 3 Row Operations and Inverse Row Operations 


In each of the following, an elementary row operation is applied to the 2 x 2 identity matrix to 
obtain an elementary matrix £, then E is restored to the identity matrix by applying the inverse row 


operation. 
1 0 1 0 1 0 
0 1 0 7 0 1 


T f 
Multiply the second Multiply the second 
row by 7. 


row by + ; 


> 3] I o| > 4] 


The next theorem is a key result about invertibility of elementary matrices. It will be a building block for many 


results that follow. 


1 
Interchange the first 


and second rows. 


I 
Add 5 times the 
second row to the 
first. 


1 
Interchange the first 


and second rows. 


I 
Add —5 times the 
second row to the 
first. 


4 


THEOREM 1.5.2 


Every elementary matrix is invertible, and the inverse is also an elementary matrix. 


Proof If Eis an elementary matrix, then E results by performing some row operation on J. Let #'g be the 
matrix that results when the inverse of this operation is performed on /. Applying Theorem 1.5.1 and using the 
fact that inverse row operations cancel the effect of each other, it follows that 


EyE =I and BE) =! 


Thus, the elementary matrix #’p is the inverse of E. 


Equivalence Theorem 


One of our objectives as we progress through this text is to show how seemingly diverse ideas in linear algebra 
are related. The following theorem, which relates results we have obtained about invertibility of matrices, 
homogeneous linear systems, reduced row echelon forms, and elementary matrices, is our first step in that 
direction. As we study new topics, more statements will be added to this theorem. 


THEOREM 1.5.3 Equivalent Statements 


If A is an » x » matrix, then the following statements are equivalent, that is, all true or all false. 
(a) Ais invertible. 

(b) Ax =Q has only the trivial solution. 

(c) The reduced row echelon form of A is /,,. 


(d) Ais expressible as a product of elementary matrices. 


It may make the logic of our proof of Theorem 
1.5.3 more apparent by writing the implications 


(@) >= @) > ©) => @) => @) 


tal 
(ed) (b) 


te) 


This makes it evident visually that the validity 


of any one statement implies the validity of all 
the others, and hence that the falsity of any one 
implies the falsity of the others. 


Proof We will prove the equivalence by establishing the chain of implications: 


(a) > (6) > ©) > @) = @) 
(a) =» (6) Assume A is invertible and let xq be any solution of. Multiplying both sides of this equation by the 
matrix 4~! gives 47! (Axg) = Ag, or (4 “Aba =0, or /xg =0, or xg =O. Thus, 4x = (has only the 
trivial solution. 


(b) = (ec) Let 4x —Q be the matrix form of the system 


211X1 #A12X%2 +... + A1y,X%y, =D 
Q21X1 + 2722X%2 +... + 22x, = 0 (1) 
Ay{X1 + ay2X2 +... + ayyXy = 0 
and assume that the system has only the trivial solution. If we solve by Gauss-Jordan elimination, then the 
system of equations corresponding to the reduced row echelon form of the augmented matrix will be 
X41 = 
x2 =0 
; (2) 
tn =0 
Thus the augmented matrix 
41 412 *** ay, 9 
a2, @22 *** ay, 0 
ay{ @y2 *** ayy, 9 


for 1 can be reduced to the augmented matrix 


10 2. @ 
010 ..0 0 
001... 0 0 
000... 1 0 


for 2 by a sequence of elementary row operations. If we disregard the last column (all zeros) in each of these 
matrices, we can conclude that the reduced row echelon form of A is /,. 


(c) =» (@) Assume that the reduced row echelon form of A is /,,, so that A can be reduced to /,, by a finite 
sequence of elementary row operations. By Theorem 1.5.1, each of these operations can be accomplished by 
multiplying on the left by an appropriate elementary matrix. Thus we can find elementary matrices 

Hy, #3, .... #;, such that 


By + Hoh A=ly, (3) 


By Theorem 1.5.2, #1, #3, .... #% are invertible. Multiplying both sides of Equation 3 on the left successively 
by #1, By, By! we obtain 


A=H,'sy!.--#,'1,=8/')!-- +! (4) 


By Theorem 1.5.2, this equation expresses A as a product of elementary matrices. 


(a) =» (a) IfA isa product of elementary matrices, then from Theorem 1.4.7 and Theorem 1.5.2, the matrix A 
is a product of invertible matrices and hence is invertible. 


A Method for Inverting Matrices 


As a first application of Theorem 1.5.3, we will develop a procedure (or algorithm) that can be used to tell 
whether a given matrix is invertible, and if so, produce its inverse. To derive this algorithm, assume for the 
moment, that A is an invertible » x , matrix. In Equation 3, the elementary matrices execute a sequence of row 
operations that reduce A to /,,. If we multiply both sides of this equation on the right by 4 —! and simplify, we 
obtain 


AT=8,+ + + FoF ily 
But this equation tells us that the same sequence of row operations that reduces A to /, will transform /y to A =I 


. Thus, we have established the following result. 


Inversion Algorithm 


To find the inverse of an invertible matrix A, find a sequence of elementary row operations that reduces 
A to the identity and then perform that same sequence of operations on /,, to obtain 4 1 


A simple method for carrying out this procedure is given in the following example. 


EXAMPLE 4 _ Using Row Operations to Find Al < 


Find the inverse of 


© nA bh 
co WwW Ww 


Solution We want to reduce A to the identity matrix by row operations and simultaneously 
apply these operations to J to produce ,4~!. To accomplish this we will adjoin the identity matrix 
to the right side of A, thereby producing a partitioned matrix of the form 

[A |Z] 
Then we will apply row operations to this matrix until the left side is reduced to /; these 
operations will convert the right side to 4~!, so the final matrix will have the form 


12 3 100 
253 010 
108 00 1 
1 2 3 10 0 We added —2 tumes the first 
0 1 =—3 —2 1 0 < row to the second and —1 times 
0 —2 5 —1 01 the first row to the third. 
2 : oe We added 2 times the 
01 —3 —2 10] + 4 bettie hard 
00 <1 5 > 1 second row to the . 
12 3 1 0 We multiplied the third 
01 —3 —2 1 Ol «— beni 
00 1 5 ed! neil i di 
12 0 —-14 6 3 We added 3 times the third 
0610 13 =—5 =—3 <— row to the second and —3 times 
00 1 5 =—2 —1 the third row to the first. 
ies = oe : We added —2 times the 
010 13 =—5 =$3] |— 4 ee 
0 4 5 > -] second row to the first. 
Thus, 
—40 16 3 
At=|] 13 -5 +3 
5 =—2 —1 


Often it will not be known in advance if a given » 5 »% matrix A is invertible. However, if it is not, then by parts 
(a) and (c) of Theorem 1.5.3 it will be impossible to reduce A to /,, by elementary row operations. This will be 

signaled by a row of zeros appearing on the /eft side of the partition at some stage of the inversion algorithm. If 
this occurs, then you can stop the computations and conclude that 4 is not invertible. 


EXAMPLE 5 Showing That a Matrix Is Not Invertible 


Consider the matrix 


1 6 
A=| 24 =1 
—l & Ss 


Applying the procedure of Example 4 yields 


16 4 100 
24 —1 01 0 
—1 2 5 00 1 

1 6 4 10 0 We added —2 tumes the first 
0 =—-8 =9 —2 1 0 < row to the second and added 
0 8 9 101 the first row to the third. 
1 6 4 10 0 We added the 
0 —8§ —9 =—2 1 0 — second row to 
0 0 0 —1 11 the third. 


Since we have obtained a row of zeros on the left side, A is not invertible. 


EXAMPLE 6 Analyzing Homogeneous Systems <<“ 


Use Theorem 1.5.3 to determine whether the given homogeneous system has nontrivial solutions. 
(a) X1 + 2x2+ 3x3=0 
2x1 + 5x2 + 3x3=0 
X41 + 8x3 =0 
(b) %1+6x2+4x3=0 
2x, 4x27 — x3=0 
=x, + 2x2+5x3=0 


Solution From parts (a) and (b) of Theorem 1.5.3 a homogeneous linear system has only the 
trivial solution if and only if its coefficient matrix is invertible. From Example 4 and Example 5 
the coefficient matrix of system (a) is invertible and that of system (b) is not. Thus, system (a) has 
only the trivial solution whereas system (b) has nontrivial solutions. 


Concept Review 
¢ Row equivalent matrices 
e Elementary matrix 
° Inverse operations 


e Inversion algorithm 


Skills 

e Determine whether a given square matrix is an elementary. 

e Determine whether two square matrices are row equivalent. 

e Apply the inverse of a given elementary rwo operation to a matrix. 


e Apply elementary row operations to reduce a given square matrix to the identity matrix. 


e Understand the relationships between statements that are equivalent to the invertibility of a square 
matrix (Theorem 1.5.3). 


e Use the inversion algorithm to find the inverse of an invertible matrix. 


e Express an invertible matrix as a product of elementary matrices. 


Exercise Set 1.5 


1. Decide whether each matrix below is an elementary matrix. 


ee 


ol | 
(c) |1 1 0 
00 1 
00 0 
(d)}2 00 2 
o100 
0010 
000 1 
Answer: 


(a) Elementary 

(b) Not elementary 
(c) Not elementary 
(d) Not elementary 


2. Decide whether each matrix below is an elementary matrix. 


Ce 


(b) [0 0 1 
01 0 
10 0 

(c) |1 90 0 
01 9 
00 1 

(d) | —1 0 0 
00 1 
01.0 


3. Find a row operation and the corresponding elementry matrix that will restore the given elementary matrix to 


the identity matrix. 


(b) |} —? 0 0 
010 
00 1 
(c) 100 
010 
—5 0 1 
(d)}9 0 1 0 
0100 
100 0 
000 1 
Answer: 
(a) Add 3 times row 2 to row |: k | 
(b) wk ith D 
Multip! fel! 
ultiply row yrs: 010 
00 1 
(c) 100 
Add 5 times row 1 torow3:}0 1 Q 
5 0 1 
(d) 0010 
0100 
Swap rows | and 3: 1000 
000 1 


. Find a row operation and the corresponding elementry matrix that will restore the given elementary matrix to 
the identity matrix. 


d 1 

@)1 0 -2 0 
01 OO 
00. ho 
00 01 


5. In each part, an elementary matrix E and a matrix A are given. Write down the row operation corresponding 
to E and show that the product FA results from applying the row operation to A. 


(a) p_[9 1] , [-1 -2 5 -1 
e=|( ‘ A=| 3 -6 -6 | 


(b) t 0 2—-1 0 =—4 =—4 
£Z=/0 1 Oj, A=|1 —3 =-1 5 3 
0 +3 1 2 0 1 3 <1 
(c) 10 4 14 
#=/0 1 0], A=/2 5 
OG 8] 36 
Answer: 
(a) Swap rows | and 2: ea=|_ = re =| 
(b) 2—-1 0 =—4 —4 
Add —3 times row 2 torow3:2A=] 1 =—3 =1 5 3 
-1 9 4 =12 =10 
(c) 13 28 
Add 4 times row 3 torow1:2A=| 2 5 
ce 


6. In each part, an elementary matrix FE and a matrix A are given. Write down the row operation corresponding 
to E and show that the product FA results from applying the row operation to A. 


(a) p_[-6 0] ,_ [-1 -2 5 =-1 
e=| 0 i} 4=[ 3 -6 -6 = 


(b) 1 


In Exercises 7—8, use the following matrices. 


3 4 1 a 
A=/2 —7 -1|, B= —} —1 
Be Tf -S 3 4 1 
3 4 oe 
C=|2 —7 =—1|/, D=|—-6 21 3 
2-7 3 3 4 1 
a) 
= 18 11 
3 4 1 


7. Find an elementary matrix E that satisfies the equation. 


(a) ZA=8 
(b) 4B=A 
(c) #A=C 
(d) #0=A 
Answer: 
(a) f 0 1 
01 0 
10 0 
(b) f 0 1 
01 0 
10 0 
(c) 10 0 
010 
—2 0 1 
(d)|1 9 0 
010 
20 1 
8. Find an elementary matrix E that satisfies the equation. 
(a) 4B=D 
(b) 4D=8 
(c) 4B=F 
(d) #4? =8 


In Exercises 9—24, use the inversion algorithm to find the inverse of the given matrix, if the inverse exists. 


Sf 


Answer: 


Answer: 


Answer: 


Answer: 


No inverse 


cx“ 
w o 
oes ee 
ere | 


(ey 


__J 


ornr 


5 ~ 
= Ls 


Answer: 


————————— 
FIN FIN FIN 


mI NN lo 


re | 
Co oOo 


eine 
te 
| 


es | 
WO Oo [~ 


Oh 
AINA 


Answer: 


Answer: 


o Oo Do HH 


1 pl lou |< 
Sa 


| 


23. | —1 0 1 0 
2 3 =—2 6 
0 —1 2 0 
0 0 15 
Answer 
ee een te: 
12 24 8 4 
a8 42. ot 
6 12 4 2 
i ae a 
12 24 8 4 
ek 4 
12 24 8 4 
24.| 0 0 2 0 
1 0 0 1 
0 =—1 3 0 
2 15 =3 


In Exercises 25-26, find the inverse of each of the following 4 x 4 matrices, where ki, 3, &3, &4, and k are 
all nonzero. 


2S-(a) [k, 0 0 0 
0k 0 0 
0 0 kz 0 
0 0 0 ky 


eet oS SS 


ky 
nm’ 
be O90 
29 
0 0 Zz 0 
0 0 o + 


(by. ed 
eee: 0 
0 1 0 0 
opens 
G0. ee 
- 07-0. a 


26-2) [ 0 0 0 ky 


(b) 


In Exercise 27—Exercise 28, find all values of c, if any, for which the given matrix is invertible. 


27. Cre 
say eat 7 
i Ra Lae os 


Answer: 


In Exercises 29-32, write the given matrix as a product of elementary matrices. 
29.| —3 1 
22 


Answer: 


[oa al[o ello allo alls 


31.;1 0 =—2 
04 3 
00 1 


Answer: 


10 =—2 10 =—2}]/1 0 O}/1 0 0 
04 3)/=/0 1 OF;0 1 3]/0 4 0 
as 00 1)/0 0 1]/0 0 1 
32.1 1 0 
ee 
011 


In Exercises 33-36, write the inverse of the given matrix as a product of elementary matrices. 


33. The matrix in Exercise 29. 


Answer: 
yee 
“4 8]_f 1 o]}/-4 offi -17]' ° 
1 3} > [-1 1J/ 4 ,{L9 ijjo 3 
4 3 


34. The matrix in Exercise 30. 


35. The matrix in Exercise 31. 


Answer: 

OB) PEO Ole ae cara eS 
0 + ~2}=]/0 1 ollo 1 ~3]l0 1 0 
ieee s 00 1lloo1 
60> 4). ee 


36. The matrix in Exercise 32. 


In Exercises 37-38, show that the given matrices 4 and B are row equivalent, and find a sequence of 
elementary row operations that produces B from A. 


37. 12 3 1 0 5 

A=|1 41], ®=|0 2 —2 

219 1 1 4 
Answer: 


Add —] times the first row to the second row. Add —] times the first row to the third row. Add —] times 
the second row to the first row. Add the second row to the third row. 


38. ar. 0 6 9 4 
A=|-1 1 OO], 8=|-5 -1 0 
3 0 =1 -1 -—2 =-1 


39. Show that if 
100 
A=/]0 1 0 
abe 


is an elementary matrix, then at least one entry in the third row must be a zero. 


40. Show that 


Hi 

II 
ocoYyro 
oOo cochoR 


om Oona oO 
~ On CO 
om oc 8 


is not invertible for any values of the entries. 


41. Prove that if A and B are jz; x » matrices, then A and B are row equivalent if and only if A and B have the 
same reduced row echelon form. 


42. Prove that if A is an invertible matrix and B is row equivalent to A, then B is also invertible. 


43. Show that if B is obtained from A by performing a sequence of elementary row operations, then there is a 
second sequence of elementary row operations, which when applied to B recovers A. 


True-False Exercises 

In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) The product of two elementary matrices of the same size must be an elementary matrix. 
Answer: 


False 


(b) Every elementary matrix is invertible. 
Answer: 


True 


(c) If A and B are row equivalent, and if B and C are row equivalent, then A and C are row equivalent. 
Answer: 


True 


(d) If A is an » s¢ » matrix that is not invertible, then the linear system .4, — Q has infinitely many solutions. 
Answer: 


True 


(e) If A is an » x » matrix that is not invertible, then the matrix obtained by interchanging two rows of A cannot 
be invertible. 


Answer: 


True 


(f) If A is invertible and a multiple of the first row of A is added to the second row, then the resulting matrix is 
invertible. 


Answer: 


True 


(g) An expression of the invertible matrix A as a product of elementary matrices is unique. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


1.6 More on Linear Systems and Invertible Matrics 


In this section we will show how the inverse of a matrix can be used to solve a linear system and we will develop some more results about 
invertible matrices. 


Number of Solutions of a Linear System 


In Section 1.1 we made the statement (based on Figures 1.1.1 and 1.1.2) that every linear system has either no solutions, has exactly one solution, 
or has infinitely many solutions. We are now in a position to prove this fundamental result. 


THEOREM 1.6.1 


A system of linear equations has zero, one, or infinitely many solutions. There are no other possibilities. 


Proof If 4x =h isa system of linear equations, exactly one of the following is true: (a) the system has no solutions, (b) the system has exactly 
one solution, or (c) the system has more than one solution. The proof will be complete if we can show that the system has infinitely many solutions 
in case (c). 


Assume that 44 — h has more than one solution, and let xy = Xj — X32, where x, and x are any two distinct solutions. Because x; and x2 are 
distinct, the matrix xo is nonzero; moreover, 


Axg = A(x, —x9) = Axy — Axg = b—b=0 
If we now let & be any scalar, then 
A(x, ++ kxg) = Axy + Afkxg) = Axy + &(Axg) 
=b+0=b+0=b 
But this says that x; + éxg is a solution of 4x — h. Since Xo is nonzero and there are infinitely many choices for k, the system 4x — h has 
infinitely many solutions. 


Solving Linear Systems by Matrix Inversion 


Thus far we have studied two procedures for solving linear systems—Gauss—Jordan elimination and Gaussian elimination. The following theorem 
provides an actual formula for the solution of a linear system of n equations in n unknowns in the case where the coefficient matrix is invertible. 


THEOREM 1.6.2 


If A is an invertible » 5 » matrix, then for each » % | matrix b, the system of equations 4x — h has exactly one solution, namely, x — 4—'h 


Proof Since A|A Ih} = b, it follows that » — 4~'p is a solution of 4x — hp. To show that this is the only solution, we will assume that xo is an 
x=A b y 


arbitrary solution and then show that xo must be the solution 4~!p. 


If xo is any solution of 4y — p, then Axg = b. Multiplying both sides of this equation by 4~!, we obtain xp=A —lh. 


EXAMPLE 1 Solution of a Linear System Using A’ < 


Consider the system of linear equations 
X1 + 2x2 + 3x3= 5 
2x1 +5x9+3x3= 3 
x1 + 8x3= 17 


In matrix form this system can be written as 4x — h, where 


12 3 x1 5 
A=/2 5 3], x=|%2], b=| 3 
108 x3 1? 
In Example 4 of the preceding section, we showed that A is invertible and 
—40 16 9 
At=| 13 <5 -3 
5 =2 =1 
By Theorem 1.6.2, the solution of the system is 
—40 16 9]} 5 1 
x=Ab=| 13 -5 -3|! 3/=|-1 
5 =2 =1]/ 17 


orxy=1, x9= —1, x3=2. 


Keep in mind that the method of Example | only applies when the 
system has as many equations as unknowns and the coefficient 
matrix is invertible. 


Linear Systems with a Common Coefficient Matrix 


Frequently, one is concerned with solving a sequence of systems 
Ax=b;, Ax=b3, Ax=b3,.., Ax=—b;, 
each of which has the same square coefficient matrix A. If A is invertible, then the solutions 
xX] =Ab, x2 = Ab, x3 = Abs, a» X= Ab; 


can be obtained with one matrix inversion and & matrix multiplications. An efficient way to do this is to form the partitioned matrix 
[Ajbi [ba] ~ - [bx] (1) 


in which the coefficient matrix A is “augmented” by all & of the matrices bj, b2,...,b%, and then reduce 1 to reduced row echelon form by Gauss- 
Jordan elimination. In this way we can solve all k systems at once. This method has the added advantage that it applies even when 4 is not 
invertible. 


EXAMPLE 2 Solving Two Linear Systems at Once << 


Solve the systems 
(a) %1 + 2x9+ 3x3 =4 
2x1 + 5x2 + 3x3 =5 
x1 + 8x3=9 
(b) 1+ 2x2 + 3x3 = 
2x1 +5x2+3x3 = 6 
X41 + 8x3 = —6 


Solution The two systems have the same coefficient matrix. If we augment this coefficient matrix with the columns of constants on 
the right sides of these systems, we obtain 


12 3 4 1 

25 3 fo) 6 

10: 8 9 —6 
Reducing this matrix to reduced row echelon form yields (verify) 

10 0 1 2 

010 0 1 

001 1 -1 


It follows from the last two columns that the solution of system (a) is xj = 1, x3 = 0, x3 = 1 and the solution of system (b) is x; = 2 
2xg=1,x%3= -1. 


Properties of Invertible Matrices 


Up to now, to show that an »z x », matrix A is invertible, it has been necessary to find an » 3 », matrix B such that 
AB=J and BA=J 
The next theorem shows that if we produce an », x »% matrix B satisfying either condition, then the other condition holds automatically. 


THEOREM 1.6.3 


Let A be a square matrix. 
(a) If B is a square matrix satisfying 8.4 — j, then B — Ant. 
(b) If B is a square matrix satisfying 43 — j, then p— 4-. 


We will prove part (a) and leave part (b) as an exercise. 


Proof (a) Assume that 3,4 — j. If we can show that is invertible, the proof can be completed by multiplying 3,4 — j on both sides by 47! to 
obtain 


BAA =IA7 of BI=IA7 of B=A? 
To show that 4 is invertible, it suffices to show that the system 4x — Q has only the trivial solution (see Theorem 1.5.3). Let xo be any solution of 


this system. If we multiply both sides of Axg = 0 on the left by B, we obtain BAxg = BO or /xg = 0 or xg =O. Thus, the system of equations 
Ax = 0 has only the trivial solution. 


Equivalence Theorem 


We are now in a position to add two more statements to the four given in Theorem 1.5.3. 


THEOREM 1.6.4 Equivalent Statements 


If A is an » x y» matrix, then the following are equivalent. 
(a) A is invertible. 

(b) Ax =0 has only the trivial solution. 

(c) The reduced row echelon form of A is Jy. 

(da) A is expressible as a product of elementary matrices. 
(ce) Ax =b is consistent for every 7 5 ] matrix b. 


() Ax—b has exactly one solution for every » x | matrix b. 


It follows from the equivalency of parts (e) and (f) that if you can 
show that 4x — h has at /east one solution for every » s< | matrix 
b, then you can conclude that it has exactly one solution for every 
nx | matrix b. 


Proof Since we proved in Theorem 1.5.3 that (a), (0), (c), and (d) are equivalent, it will be sufficient to prove that (a) = (f)} = (e) = (a). 


(a) =» (f) This was already proved in Theorem 1.6.2. 


(f) = (e) This is self-evident, for if 4, — h has exactly one solution for every » 5 | matrix b, then 4x — h is consistent for every » x | matrix b. 


(2) = (a) Ifthe system 4x — h is consistent for every » 5 ] matrix b, then, in particular, this is so for the systems 


1 0 0 
0 1 0 
Ax=|0], Ax=]0],-. Ax=]0 
0 0 1 


Let xj, X2,...,X be solutions of the respective systems, and let us form an » x 4; matrix C having these solutions as columns. Thus C has the form 
C= [xi}xa|- * + Xx] 


As discussed in Section 1.3, the successive columns of the product AC will be 
Ax, Ax, te AX» 


[see Formula 8 of Section 1.3]. Thus, 


AC = [Ax |Ae9|+ + + [Aen] = 


By part (b) of Theorem 1.6.3, it follows that ~* —_4—!. Thus, A is invertible. 


We know from earlier work that invertible matrix factors produce an invertible product. Conversely, the following theorem It shows that if the 
product of square matrices is invertible, then the factors themselves must be invertible. 


THEOREM 1.6.5 


Let A and B be square matrices of the same size. If AB is invertible, then 4 and B must also be invertible. 


In our later work the following fundamental problem will occur frequently in various contexts. 


A Fundamental Problem 


Let A be a fixed 2 x , matrix. Find all j; 5 | matrices b such that the system of equations 4x — h is consistent. 


IfA is an invertible matrix, Theorem 1.6.2 completely solves this problem by asserting that for every 3 x | matrix b, the linear system 4x = h has 
the unique solution x — 4~p. If A is not square, or if A is square but not invertible, then Theorem 1.6.2 does not apply. In these cases the matrix b 


must usually satisfy certain conditions in order for 4x — h to be consistent. The following example illustrates how the methods of Section 1.2 can 
be used to determine such conditions. 


EXAMPLE 3 Determining Consistency by Elimination <4 


What conditions must b1, b2, and 53 satisfy in order for the system of equations 


Xy#%xQ+4+2x3 = dy 
x1 x3 = 43 
2x, x2+3x3 = 43 
to be consistent? 
Solution The augmented matrix is 
112 4 
10 1 & 
21 3 ds 


which can be reduced to row echelon form as follows: 


O =1 =1 49-4, «— —1 times the first row was added to the second and — 2 times the first row was added to the third. 


1 2 by 
: 1 #1 }y—43 « The second row was multiphed by—1. 


112 
011 by —b9 «= The second row was added to the third. 
000 


It is now evident from the third row in the matrix that the system has a solution if and only if b1, b2, and b3 satisfy the condition 
b3—b3=—4,;=0 or b3 =, +43 
To express this condition another way, 4x — h is consistent if and only if b is a matrix of the form 
by 
b= by 
by +43 


where bj and by are arbitrary. 


EXAMPLE 4 Determining Consistency by Elimination <4 


What conditions must bj, b2, and 53 satisfy in order for the system of equations 
X1 + 2x2 4+ 3x3 =, 
2x1 + 5x2 + 3x3 = 43 


xX + 8x3 = 53 
to be consistent? 
Solution The augmented matrix is 
12 3 db, 
25 3 b3 
10 8 4 


Reducing this to reduced row echelon form yields (verify) 
1.0 0 =—40b; + 1653 + 943 
010 13h; — 563 — 353 (2) 
001 5b, — 2b3g— 53 


In this case there are no restrictions on 51, b2, and 3, so the system has the unique solution 
x, = —40by + 1694+ 963, xp = 136) —5bg— 33, x9 = 5by — 2b9—43 (3) 


for all values of 51, b2, and b3. 


What does the result in Example 4 tell you about the coefficient 
matrix of the system? 


Skills 
¢ Determine whether a linear system of equations has no solutions, exactly one solution, or infinitely many solutions. 
¢ Solve linear systems by inverting its coefficient matrix. 


¢ Solve multiple linear systems with the same coefficient matrix simultaneously. 


¢ Be familiar with the additional conditions of invertibility stated in the Equivalence Theorem. 


Exercise Set 1.6 


In Exercises 1-8, solve the system by inverting the coefficient matrix and using Theorem 1.6.2. 


1, X14 %2=2 
5x1 + 6x2 =9 


Answer: 


3. X41 4+3x9+%3 = 4 
2x, +2xg+%3 = <1 
2x, + 3x9+%3 = 3 


Answer: 


xy= 1, x9=4, x3= -7 
4, 5x, + 3x24+2x3 = 
3x, +3x2+2x3 = 2 
x94 %3 = 5 
5. x+tytz = 5 
x+y—-4z = 10 
—4x+y+z= 0 


Answer: 


x=1,x=5,x=-1 
6. =—x-ay-32 = 
w+ x+4y +4z 
w+ 3x-iy+9z2 = 
<w-2x=—4y-6z2 = 
7, 3x, +5x2= 41 
x1 + 2x2 = 43 


ll 
A ~~ Oo 


Answer: 


xy = 2b, —5bo, xp = — 4b, + 349 
8. %1 + 2x2+ 3x3 = 21 

2x1 + 5x9+ 5x3 bg 

3x, +5x9+ 8x3 = 33 


In Exercises 9-12, solve the linear systems together by reducing the appropriate augmented matrix. 


9, x1, —5x2=4, 
3x1 + 2x2 = 42 
(i) 21=1, 42=4 
Gi) 21= —2, 42=5 


Answer: 

D) epee. Soe ol 
ORS 2-7 
(ii) x, = 2h m= 


10. —x1+4xg+ x3 = 44 
X1+9x2— 2x3 
6x, +4xg— 8x3 = 43 
(i) #1=0, 42=1, 43=0 
(i) 21= —3, 42=4, b3= -5 


Il 
oe 
bo 


11. 4x1 — 7x2 = 44 
X14 2x2 = 43 
(i) 21=90, 42=1 
(ii) ®1= —4, b2=6 
(iii) 21 = —1, 42=3 
(iv) 9) = —5, 29=1 


Answer: 
(Dt See 5 As 
Pag te 1S 
Gi) pp eet a5 28 
AL "457 4245 
Gli), 2 19 2 13 
RSG Aa 16 
iv = 1 a3 
aes 5° 425 
12. x1 + 3x24 5x3= 4, 
=x, — 2x3 =) 


2x1 + 5x2 + 4x3 = 43 

(i) 4:=1, 42=0, d3=-1 
(ii) 21=9, 42=1, 43=1 

ii) 21 = -—1, 2=—-1, 43=0 


In Exercises 13-17, determine conditions on the 6;'s, if any, in order to guarantee that the linear system is consistent. 


13. %1 + 3x2=41 
2x, + x2 = 53 


Answer: 


No conditions on ); and b3 
14, 6x1 422 = 41 


3x1 — 2x2 = 53 

15. x1 -2xg+5x3 = 2b 
4x,—5xg+8x3 = 49 
=—3x,+3x2—3x3 = 43 
Answer: 
b3=),-22 

16. %1-2%2- x3 = 41 
—4x; + 5xg+2x3 = 49 
—4x,+7xg+4x3 = 43 

17.0 Xp — XQ43xR+2xg = J 
=—2x, + x24+5x3+ x4 = 49 
=—3x, + 2x24 2x3—- x4 = 43 
4x, —3xg+ x34+3x4 = 4 
Answer: 


by =)34+ 54, by = 2634+ 54 


18. Consider the matrices 


2 i 2 x1 
A=|2 2 =2] and x=|%2 
31 1 x3 


(a) Show that the equation 4, — x can be rewritten as (A — /)x =0 and use this result to solve 4x = x for x. 


(b) Solve Ax = 4x. 


In Exercises 19-20, solve the given matrix equation for X. 


19.}1 -—1 1 2-1 578 
2 3 O|F=/4 O -3 01 
0 2 =1 3 5 =—7 2 1 
Answer 


20.;-2 9 1 
1 


21. Let 4x = 0 be a homogeneous system of n linear equations in n unknowns that has only the trivial solution. Show that if k is any positive 
integer, then the system 4" x —Q also has only the trivial solution. 


22. Let 4x — 0 be a homogeneous system of n linear equations in n unknowns, and let Q be an invertible » 5¢ , matrix. Show that 4, — Q has just 
the trivial solution if and only if (@.4)x = 0 has just the trivial solution. 


23. Let Ax = h be any consistent system of linear equations, and let x; be a fixed solution. Show that every solution to the system can be written in 
the form x = Xj + Xg, where xo is a solution to 4x — Q. Show also that every matrix of this form is a solution. 


24. Use part (a) of Theorem 1.6.3 to prove part (b). 


True-False Exercises 

In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) It is impossible for a linear system of linear equations to have exactly two solutions. 
Answer: 


True 


(b) If the linear system 4x — h has a unique solution, then the linear system 4x — ¢ also must have a unique solution. 
Answer: 


True 


(c) If A and B are » x » matrices such that AS = /,,, then BA= J). 
Answer: 


True 


(d) If A and B are row equivalent matrices, then the linear systems 4x —Q and 3x —Q have the same solution set. 
Answer: 


True 


(e) If A is an » % matrix and S is an » x » invertible matrix, then if x is a solution to the linear system (§ 1 AS)x = b, then Sx is a solution to the 


linear system Ay = Sh. 
Answer: 


True 


(f) Let A be an » x¢ 2 matrix. The linear system Ax = 4x has a unique solution if and only if A — 4; is an invertible matrix. 
Answer: 


True 


(g) Let A and B be » x » matrices. If A or B (or both) are not invertible, then neither is AB. 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


1.7 Diagonal, Triangular, and Symmetric Matrices 


In this section we will discuss matrices that have various special forms. These matrices arise in a wide variety of applications 
and will also play an important role in our subsequent work. 
Diagonal Matrices 


A square matrix in which all the entries off the main diagonal are zero is called a diagonal matrix. Here are some examples: 


6 00 0 
0 0 2 0 as Oo =—4 0 
Coy Y=) lo nal (2 8? 
Oo 60 (0 


A general » x , diagonal matrix D can be written as 


oo 


ad, O 
0 @ 0 
D : (1) 
0 0 ... &@y 
A diagonal matrix is invertible if and only if all of its diagonal entries are nonzero; in this case the inverse of | is 
liad, oO)... 60 
= 0 lidz ... 0 
a) (2) 
0 0 1. lidy 
Confirm Formula 2 by showing that 
ppt=pD"D=1 
Powers of diagonal matrices are easy to compute; we leave it for you to verify that if D is the diagonal matrix | and kis a 
positive integer, then 
df 0 0 
k_| 0 af ... 0 
Bee a (3) 
0 0 «2 


EXAMPLE 1 Inverses and Powers of Diagonal Matrices 


If 


then 


1 00 1 0 0 
1 1 0 0 | 
At=|° —3 9 Aalo -243 of aba]? ~aaz 9 
i. 0 0 32 ne 
0 of oo F 


Matrix products that involve diagonal factors are especially easy to compute. For example, 
2; 0 0 |pay, ay a13 a4 djajj dyaj2 dyayz dyayg 
0 dz 0 || 421 422 423 @24 djax, d2ax7 d7a73 dza24 
0 0 dy), 731 432 933 934 d3ax, d3ax2 d3a33 d3a3q 


d@jaj; d3a12 d3a13 
411 @12 &43 ad; 0 0 


421422 23}/ 5 2 9 | _ @ja21 d7a22 d3a23 
a3, 432 433 


daz, d2a32 d3a33 
aq, a4 a43|[9 9 3 


dja4, djaq, draqy 


In words, to multiply a matrix A on the left by a diagonal matrix D, one can multiply successive rows of A by the 
successive diagonal entries of D, and to multiply A on the right by D, one can multiply successive columns of A by the 
successive diagonal entries of D. 


Triangular Matrices 


A square matrix in which all the entries above the main diagonal are zero is called Jower triangular, and a square matrix in 
which all the entries below the main diagonal are zero is called upper triangular. A matrix that is either upper triangular or 
lower triangular is called triangular. 


EXAMPLE 2 Upper and Lower Triangular Matrices 


aij 42 4j3 444 a; 0 O O 

O a2 423 a4 a2, 422 0 O 4 
0 O a3 ary a3; 432 433 ~O 

0 0 8) 44 a4, 442 G43 gy 

A general 4 x 4 upper A general ! 4 lower 
triangular matrix triangular matrix 


Remark Observe that diagonal matrices are both upper triangular and lower triangular since they have zeros below and 
above the main diagonal. Observe also that a square matrix in row echelon form is upper triangular since it has zeros below 
the main diagonal. 


Properties of Triangular Matrices 


Example 2 illustrates the following four facts about triangular matrices that we will state without formal proof. 


e Asquare matrix A= [a;;] is upper triangular if and only if all entries to the left of the main diagonal are zero; that is, 
a;; = Oift > j (Figure 1.7.1). 


e Asquare matrix A= [a;;] is lower triangular if and only if all entries to the right of the main diagonal are zero; that is, 
a3; = Oift <j (Figure 1.7.1). 


* Asquare matrix A= [aj] is upper triangular if and only if the ith row starts with at least ; — ] zeros for every i. 


» Asquare matrix A= [aj] is lower triangular if and only if the jth column starts with at least j — 1 zeros for every j. 


Figure 1.7.1 


The following theorem lists some of the basic properties of triangular matrices. 


THEOREM 1.7.1 


(a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is 
lower triangular. 


(b) The product of lower triangular matrices is lower triangular, and the product of upper triangular matrices is upper 
triangular. 
(c) A triangular matrix is invertible if and only if its diagonal entries are all nonzero. 


(d) The inverse of an invertible lower triangular matrix is lower triangular, and the inverse of an invertible upper 
triangular matrix is upper triangular. 


Part (a) is evident from the fact that transposing a square matrix can be accomplished by reflecting the entries about the main 
diagonal; we omit the formal proof. We will prove (b), but we will defer the proofs of (c) and (d) to the next chapter, where 
we will have the tools to prove those results more efficiently. 


Proof (b) We will prove the result for lower triangular matrices; the proof for upper triangular matrices is similar. Let 
A= [a;;] and 8 = [4,5] be lower triangular » 3 » matrices, and let C = [c;;] be the product ¢’ = 4p. We can prove that C 
is lower triangular by showing that cj; = 0 for i <j. But from the definition of matrix multiplication, 


Ci =ay1)1; | ay2b2; Poe st Ainb yy 
If we assume that i <j, then the terms in this expression can be grouped as follows: 
Cy = ajjb1; | a;2b2; aij —1yPG—Hj | ay3D55 Fess 4 Ain yy 
Terms in which the row Terms in which the row 
number of 5 is less than the number of a is less than 
column number of 5 the column number of a 


In the first grouping all of the } factors are zero since B is lower triangular, and in the second grouping all of the a factors are 
zero since A is lower triangular. Thus, ¢j; = 0, which is what we wanted to prove. 


EXAMPLE 3 Computations with Triangular Matrices << 


Consider the upper triangular matrices 


13 <1 3 =2 
A=|0 2 4], B=|0 O -1 
00 5 0 0 1 


It follows from part (c) of Theorem 1.7.1 that the matrix A is invertible but the matrix B is not. Moreover, the 
theorem also tells us that ,4~!, AB, and BA must be upper triangular. We leave it for you to confirm these three 
statements by showing that 


ee: 
2° 3-2 -2 a Sai 
At=|0 5 -2|, aB=|0 0 2], BA=|0 0 -5 
0 0 5 00 5 

0 o i 


Symmetric Matrices 


DEFINITION 1 


A square matrix A is said to be symmetric if 4— 4°. 


It is easy to recognize a symmetric matrix by 
inspection: The entries on the main diagonal have no 
restrictions, but mirror images of entries across the 
main diagonal must be equal. Here is a picture using 
the second matrix in Example 4: 


All diagonal matrices, such as the third matrix in 
Example 4, obviously have this property. 


EXAMPLE 4 Symmetric Matrices << 


The following matrices are symmetric, since each is equal to its own transpose (verify). 


d, 0 0 0 
[2-3], Ja so 0 ad, 0 0 
3 sl) [504 0 0 dz 0 


Remark It follows from Formula 11 of Section 1.3 that a square matrix A= [a;;] is symmetric if and only if 


(A) i= (A) 53 (4) 


for all values of i and /. 


The following theorem lists the main algebraic properties of symmetric matrices. The proofs are direct consequences of 
Theorem 1.4.8 and are omitted. 


THEOREM 1.7.2 


If A and B are symmetric matrices with the same size, and if k is any scalar, then: 
(a) A’ is symmetric. 

(b) A+ Band 4 — Fare symmetric. 

(c) kA is symmetric. 


It is not true, in general, that the product of symmetric matrices is symmetric. To see why this is so, let A and B be symmetric 
matrices with the same size. Then it follows from part (e) of Theorem 1.4.8 and the symmetry of A and B that 


(AB)? =BTAT=BA 
Thus, (43) T _ pif and only if 43 — 4A, that is, if and only if A and B commute. In summary, we have the following 


result. 


THEOREM 1.7.3 


The product of two symmetric matrices is symmetric if and only if the matrices commute. 


EXAMPLE 5 Products of Symmetric Matrices << 


The first of the following equations shows a product of symmetric matrices that is not symmetric, and the 
second shows a product of symmetric matrices that is symmetric. We conclude that the factors in the first 
equation do not commute, but those in the second equation do. We leave it for you to verify that this is so. 


2 silo) = [55 2. 
[> sll3 a] = [1 3 


Invertibility of Symmetric Matrices 


In general, a symmetric matrix need not be invertible. For example, a diagonal matrix with a zero on the main diagonal is 


symmetric but not invertible. However, the following theorem shows that if a symmetric matrix happens to be invertible, then 
its inverse must also be symmetric. 


THEOREM 1.7.4 


If A is an invertible symmetric matrix, then 4~! is symmetric. 


Proof Assume that A is symmetric and invertible. From Theorem 1.4.9 and the fact that 4— 4 T we have 


(a\" = (ary =a 


which proves that ,4~! is symmetric. 


Products AA! and A'A 


Matrix products of the form AA! and A’A arise in a variety of applications. If A is an 3 s¢ 92 matrix, then A’ isan eae 
matrix, so the products AA’ and A‘A are both square matrices—the matrix AA’ has size #2 x mm, and the matrix A’d has size 
» x». Such products are always symmetric since 


(aa7)" = (a7\" a? = aa? and (aTa\" = 47 (47)' = a7 


EXAMPLE 6 The Product of a Matrix and Its Transpose Is Symmetric 


Let A be the 2 x 3 matrix 


Then 
1 3 10 —2 —11 
ATA=|-2 0 E — 4 =| -2 4 -=8 
4 =—5 7 -—11 -8 41 
1 3 
7r_[1 —2 4 a 21 —17 
sl =|; 0 3] B, = - ee 3 


Observe that A’4 and AA! are symmetric as expected. 


Later in this text, we will obtain general conditions on A under which AA!’ and A‘A are invertible. However, in the special 
case where A is square, we have the following result. 


THEOREM 1.7.5 


If A is an invertible matrix, then AA! and A! A are also invertible. 


Proof Since A is invertible, so is A’ by Theorem 1.4.9. Thus 44! and 474 are invertible, since they are the products of 
invertible matrices. 


Concept Review 

e Diagonal matrix 

¢ Lower triangular matrix 
© Upper triangular matrix 
¢ Triangular matrix 


e Symmetric matrix 


Skills 

e Determine whether a diagonal matrix is invertible with no computations. 

e Compute matrix products involving diagonal matrices by inspection. 

e Determine whether a matrix is triangular. 

e Understand how the transpose operation affects diagonal and triangular matrices. 
e Understand how inversion affects diagonal and triangular matrices. 


e Determine whether a matrix is a symmetric matrix. 


Exercise Set 1.7 


In Exercises 1—4, determine whether the given matrix is invertible. 


33 


Answer: 
1 
5 0 
1 
naa 
2./4 0 0 
00 0 
00 5 
3.}—-1 0 0 
02 0 
1 
003 


Answer: 


-1 00 
a, 

0 5 0 

0 0 3 
4/-1 0 0 QO 
03 0 90 
00-3 690 
00 0 =2 


5./3 0 0 2° él 
0 =—1 Ol] —4 1 
Oo O02 25 
Answer 
6 3 
4 -1 
4 10 
6 1 2 -51/74 9 9 
ae 0 03 0 
002 
7/5 0 Oj];/—-3 204 =—4 
02 0 1-5 30 3 
00 —3]/}-6 222 2 
Answer: 
—15 10060) 20 0 —20 
2-10 6 QO 6 
18 -6 -—6 -6 —-6 
8. 0 0 4-1 3])-3 0 


0 4//=-5 1 =2 


In Exercises 9-12, find 42, 4~2, and ,4—* (where k is any integer) by inspection. 


9. ,_[1 0 
A= 


Answer: 


11. 


ng 
5 0 0 
= as 
A=|0 5 0 
4d 
005 
Answer: 
1 
=.) 20 
4 40 0 2* o 0 
Aa} 0 5 0], AF=|0 0 of, A*=| 9 3% 
at 00 16 0 0 4k 
16 
12. ae , TD 
0-4 00 
a= 0 oOo 3 0 
¢ 6° 0:2 


In Exercises 13-19, decide whether the given matrix is symmetric. 


13.| —-8 =—8 
Oo 60 
Answer: 


Not symmetric 
alk | 

he 2 
| 0 = 

—7 67 


Answer: 


Answer: 


Not symmetric 


18.| 2 —1 3 
-1 51 
a sd 
19.)9 0 1 
0 2 0 
3.0 0 


Answer: 


Not symmetric 


In Exercises 20-22, decide by inspection whether the given matrix is invertible. 


20.) -1 2 4 
03 0 
00 5 
21.}0 1 =—2 5 
0 1 5 6 
00-3 1 
0°.0- “OF 5 
Answer: 
Not invertible 
22.;} 2 00 O 
—3 =—-1 0 O 
—4 -6 0 0 
0 38 =5 


In Exercises 23—24, find all values of the unknown constant(s) in order for A to be symmetric. 


Maeles o] 


a+5 =1 
Answer: 
a=—-8 
24. 2 a@—2b+2c 2a+b+e 
A=|3 5 ate 
0 —2 7 


In Exercises 25-26, find all values of x in order for A to be invertible. 


25 x—1 x? x4 
A=! 9 x+2 x3 
0 Oo 06x=—4 
Answer: 
x#1, —2,4 
6 x= 0 0 
- _1 
A= x x 3 0 
x? x x-4 


27 1 0 0 
A=|0 -1 
a es | 


Answer: 


1 oO 9 
Oo =-1 0 
Oo 0 =1 
28. 90 0 
At=10 40 
00 1 
29. Verify Theorem 1.7.1(b) for the product AB, where 
—-1 2 5 2-8 0 
A=| 01 3], B=/0 2 1 
00 -—4 0 60 3 


30. Verify Theorem 1.7.1(d) for the matrices 4 and B in Exercise 29. 
31. Verify Theorem 1.7.4 for the given matrix A. 


® a=| 2 “ak 


mt 3 
(b) {2-3 
A=|-2 1 7 
es ae 


32. Let A be an » x% » symmetric matrix. 
(a) Show that A” is symmetric. 
(b) Show that 2.47 — 3,44 j is symmetric. 
33. Prove: If 474 — 4, then A is symmetric and 4 — 42. 
34, Find all 3 x. 3 diagonal matrices A that satisfy 42 — 3,4 — 47 — 0. 
35. Let A= [a@3;] be an » % % matrix. Determine whether A is symmetric. 
(a) aj =i? + j? 
(6) aj; = f=? 
(c) @ij = 2+ 27 
(A) ayy = 27 +277 


Answer: 


(a) Yes 
(b) No (unless x = 1) 
(c) Yes 
(d) No (unless x = 1) 
36. On the basis of your experience with Exercise 35, devise a general test that can be applied to a formula for aj; to determine 
whether A = [a,;] is symmetric. 


37. A square matrix A is called skew-symmetric if 47 — — A. 


Prove: 


(a) IfA is an invertible skew-symmetric matrix, then 4~! is skew-symmetric. 


(b) If.4 and B are skew-symmetric matrices, then so are At, A+8, A—B, and XA for any scalar k. 


(c) Every square matrix A can be expressed as the sum of a symmetric matrix and a skew-symmetric matrix. [Hint: Note 


the identity A= >( A") 5(4-47)1 


In Exercises 38-39, fill in the missing entries (marked with x) to produce a skew-symmetric matrix. 


38 x x 4 
A=|0 x x 
x —1 x 
39. x OO x 
A=|x x <4 
8 xK x 
Answer: 

00 +8 

00 —4 

84 0 


40. Find all values of a, b, c, and d for which A is skew-symmetric. 
0 2a—3b+¢ 3¢4—5b+ 5c 
A=| =—2 0 5a — 84 + 6¢ 
—3 —5 a 
41. We showed in the text that the product of symmetric matrices is symmetric if and only if the matrices commute. Is the 


product of commuting skew-symmetric matrices skew- symmetric? Explain. [Note: See Exercise 37 for the deffinition of 
skew-symmetric. | 


42. If the » x 4 matrix A can be expressed as 4 = [,7/, where L is a lower triangular matrix and U is an upper triangular 
matrix, then the linear system 4, — h can be expressed as {, 77x — h and can be solved in two steps: 


Step 1. Let U’x = y, so that 77x = h can be expressed as Ly = b. Solve this system. 
Step 2. Solve the system U/x = y for x. 


In each part, use this two-step method to solve the given system. 


(a) 10 O}/2 -—1 3)/%1 1 
—2 3 O}/0 1 2))/%2)/=] —2 
24 1);/0 O 44} *3 0 
(b) 2 0 O}/3 =—5 2)[*1 4 
4 1 0//0 4 1)/)/%2}=] —5 
—3 =-2 3|/0 O 2]/%*3 2 
43. Find an upper triangular matrix that satisfies 
1 30 
AP = 
> =| 
Answer: 
1 10 
a= 
> 


True-False Exercises 
In parts (a)—(m) determine whether the statement is true or false, and justify your answer. 


(a) The transpose of a diagonal matrix is a diagonal matrix. 


Answer: 


True 


(b) The transpose of an upper triangular matrix is an upper triangular matrix. 
Answer: 


False 


(c) The sum of an upper triangular matrix and a lower triangular matrix is a diagonal matrix. 
Answer: 


False 


(d) All entries of a symmetric matrix are determined by the entries occurring on and above the main diagonal. 
Answer: 


True 


(e) All entries of an upper triangular matrix are determined by the entries occurring on and above the main diagonal. 
Answer: 


True 


(f) The inverse of an invertible lower triangular matrix is an upper triangular matrix. 
Answer: 


False 


(g) A diagonal matrix is invertible if and only if all of its diagonal entries are positive. 
Answer: 


False 


(h) The sum of a diagonal matrix and a lower triangular matrix is a lower triangular matrix. 
Answer: 


True 


(i) A matrix that is both symmetric and upper triangular must be a diagonal matrix. 
Answer: 


True 


(j) If A and B are » s¢ » matrices such that 4 4+ # is symmetric, then A and B are symmetric. 
Answer: 


False 


(k) If A and B are » x » matrices such that 4 +. 9 is upper triangular, then A and B are upper triangular. 
Answer: 


False 


Wf 4 isa symmetric matrix, then A is a symmetric matrix. 


Answer: 


False 


(m) If kA is a symmetric matrix for some ¢ + 0, then A is a symmetric matrix. 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


1.8 Applications of Linear Systems 


In this section we will discuss some relatively brief applications of linear systems. These are but a small sample of the wide 
variety of real-world problems to which our study of linear systems is applicable. 


Network Analysis 


The concept of a network appears in a variety of applications. Loosely stated, a network is a set of branches through which 
something “flows.” For example, the branches might be electrical wires through which electricity flows, pipes through 
which water or oil flows, traffic lanes through which vehicular traffic flows, or economic linkages through which money 
flows, to name a few possibilities. 


In most networks, the branches meet at points, called nodes or junctions, where the flow divides. For example, in an 
electrical network, nodes occur where three or more wires join, in a traffic network they occur at street intersections, and in 
a financial network they occur at banking centers where incoming money is distributed to individuals or other institutions. 


In the study of networks, there is generally some numerical measure of the rate at which the medium flows through a 
branch. For example, the flow rate of electricity is often measured in amperes, the flow rate of water or oil in gallons per 
minute, the flow rate of traffic in vehicles per hour, and the flow rate of European currency in millions of Euros per day. 
We will restrict our attention to networks in which there is flow conservation at each node, by which we mean that the rate 
of flow into any node is equal to the rate of flow out of that node. This ensures that the flow medium does not build up at 
the nodes and block the free movement of the medium through the network. 


A common problem in network analysis is to use known flow rates in certain branches to find the flow rates in all of the 
branches. Here is an example. 


EXAMPLE 1 Network Analysis Using Linear Systems 


Figure 1.8.1 shows a network with four nodes in which the flow rate and direction of flow in certain 
branches are known. Find the flow rates and directions of flow in the remaining branches. 


30 
35 55 


60 
Figure 1.8.1 
Solution As illustrated in Figure 1.8.2, we have assigned arbitrary directions to the unknown flow rates 


X1, *3, and x3. We need not be concerned if some of the directions are incorrect, since an incorrect direction 
will be signaled by a negative value for the flow rate when we solve for the unknowns. 


60 
Figure 1.8.2 


It follows from the conservation of flow at node A that 
X1 +x2= 30 
Similarly, at the other nodes we have 
x9+x3=35 (node 3) 
x3+15=60 (node C) 
xy +15=55 (node PD) 


These four conditions produce the linear system 


XxX, + X2 = 30 
x9+%3=35 
x3=45 

x4 = 40 


which we can now try to solve for the unknown flow rates. In this particular case the system is sufficiently 
simple that it can be solved by inspection (work from the bottom up). We leave it for you to confirm that the 
solution is 


x,=40, xg9=—-10, x3=45 


The fact that x2 is negative tells us that the direction assigned to that flow in Figure 1.8.2 is incorrect; that is, 
the flow in that branch is into node A. 


EXAMPLE 2 Design of Traffic Patterns 


The network in Figure 1.8.3 shows a proposed plan for the traffic flow around a new park that will house the 
Liberty Bell in Philadelphia, Pennsylvania. The plan calls for a computerized traffic light at the north exit on 
Fifth Street, and the diagram indicates the average number of vehicles per hour that are expected to flow in 
and out of the streets that border the complex. All streets are one-way. 


(a) How many vehicles per hour should the traffic light let through to ensure that the average number of 
vehicles per hour flowing into the complex is the same as the average number of vehicles flowing out? 


(b) Assuming that the traffic light has been set to balance the total flow in and out of the complex, what can 
you say about the average number of vehicles per hour that will flow along the streets that border the 
complex? 


N 


A 200 § Traffic 200 x 
W <) E light 
. A v A 
Market St. _. Cc X3 B 
500 ———>— - a 400 500 > ° e > 400 
| Liberty |" . AX 
<| Park |= “7 ° 
700 << ” +i at 400 700 < oo <« @ < 400 
Chestnut St. D Xj A 
A A 
600 600 
(a) (b) 
Figure 1.8.3 
Solution 


(a) If, as indicated in Figure 1.8.3b we let x denote the number of vehicles per hour that the traffic light must 
let through, then the total number of vehicles per hour that flow in and out of the complex will be 


Flowing in: 500 + 400 + 600 + 200 = 1700 
Flowing out: x +- 700 + 400 


Equating the flows in and out shows that the traffic light should let x = 600 vehicles per hour pass 
through. 


To avoid traffic congestion, the flow in must equal the flow out at each intersection. For this to happen, 
the following conditions must be satisfied: 


(b 


— 


Intersection Flow In Flow Out 

4 4004600 = x1 +%2- 
B *24+%3 = 400+4x 
Cc 5004+200 = %*%3+%4 
D m1+%4 = 700 


Thus, with x — 600, as computed in part (a), we obtain the following linear system: 


XxX, + x2 = 1000 
XQ+ x3 = 1000 
x3+x4= 700 

X41 +x4= 700 


We leave it for you to show that the system has infinitely many solutions and that these are given by the 
parametric equations 


xy =700—£, x2 = 30048, x3 =700—£, x4=8 (1) 


However, the parameter ¢ is not completely arbitrary here, since there are physical constraints to be 
considered. For example, the average flow rates must be nonnegative since we have assumed the streets 
to be one-way, and a negative flow rate would indicate a flow in the wrong direction. This being the 
case, we see from | that ¢ can be any real number that satisfies 0 < ¢ < 700, which implies that the 
average flow rates along the streets will fall in the ranges 


Osx, = 700, 300<x2=1000, O<x3= 700, Ox xqg= 700 


Electrical Circuits 


Next, we will show how network analysis can be used to analyze electrical circuits consisting of batteries and resistors. A 
battery is a source of electric energy, and a resistor, such as a lightbulb, is an element that dissipates electric energy. Figure 
1.8.4 shows a schematic diagram of a circuit with one battery (represented by the symbol 4p), one resistor (represented by 


the symbol ,,—), and a switch. The battery has a positive pole (+) and a negative pole (—). When the switch is closed, 
electrical current is considered to flow from the positive pole of the battery, through the resistor, and back to the negative 
pole (indicated by the arrowhead in the figure). 


+ 


Switch 


Figure 1.8.4 


Electrical current, which is a flow of electrons through wires, behaves much like the flow of water through pipes. A battery 
acts like a pump that creates “electrical pressure” to increase the flow rate of electrons, and a resistor acts like a restriction 
in a pipe that reduces the flow rate of electrons. The technical term for electrical pressure is electrical potential; it is 
commonly measured in volts (V). The degree to which a resistor reduces the electrical potential is called its resistance and 
is commonly measured in ohms (Q). The rate of flow of electrons in a wire is called current and is commonly measured in 
amperes (also called amps) (A). The precise effect of a resistor is given by the following law: 


Ohm's Law 


If a current of J amperes passes through a resistor with a resistance of R ohms, then there is a resulting drop of E 
volts in electrical potential that is the product of the current and resistance; that is, 


H=iR 


A typical electrical network will have multiple batteries and resistors joined by some configuration of wires. A point at 
which three or more wires in a network are joined is called a node (or junction point). A branch is a wire connecting two 
nodes, and a closed loop is a succession of connected branches that begin and end at the same node. For example, the 
electrical network in Figure 1.8.5 has two nodes and three closed loops— two inner loops and one outer loop. As current 
flows through an electrical network, it undergoes increases and decreases in electrical potential, called voltage rises and 
voltage drops, respectively. The behavior of the current at the nodes and around closed loops is governed by two 
fundamental laws: 


- + 


Figure 1.8.5 


Kirchhoff's Current Law 


The sum of the currents flowing into any node is equal to the sum of the currents flowing out. 


Kirchhoff's Voltage Law 


In one traversal of any closed loop, the sum of the voltage rises equals the sum of the voltage drops. 


Kirchhoff's current law is a restatement of the principle of flow conservation at a node that was stated for general networks. 
Thus, for example, the currents at the top node in Figure 1.8.6 satisfy the equation 7; = 3 + /3. 


Figure 1.8.6 


In circuits with multiple loops and batteries there is usually no way to tell in advance which way the currents are flowing, 
so the usual procedure in circuit analysis is to assign arbitrary directions to the current flows in the branches and let the 
mathematical computations determine whether the assignments are correct. In addition to assigning directions to the 
current flows, Kirchhoff's voltage law requires a direction of travel for each closed loop. The choice is arbitrary, but for 
consistency we will always take this direction to be clockwise (Figure 1.8.7). We also make the following conventions: 


e A voltage drop occurs at a resistor if the direction assigned to the current through the resistor is the same as the direction 
assigned to the loop, and a voltage rise occurs at a resistor if the direction assigned to the current through the resistor is 
the opposite to that assigned to the loop. 


e A voltage rise occurs at a battery if the direction assigned to the loop is from — to + through the battery, and a voltage 
drop occurs at a battery if the direction assigned to the loop is from + to — through the battery. 


If you follow these conventions when calculating currents, then those currents whose directions were assigned correctly 
will have positive values and those whose directions were assigned incorrectly will have negative values. 


a8 


+ - 
| Clockwise closed-loop 
convention with arbitrary 
| direction assignments to 
currents in the branches 


Figure 1.8.7 


EXAMPLE 3 A Circuit with One ClosedLoop 


Determine the current J in the circuit shown in Figure 1.8.8. 


I 


+ 
6V () 3 


Figure 1.8.8 


Solution Since the direction assigned to the current through the resistor is the same as the direction of the 
loop, there is a voltage drop at the resistor. By Ohm's law this voltage drop is 7 — /® — 3j. Also, since the 
direction assigned to the loop is from — to + through the battery, there is a voltage rise of 6 volts at the 
battery. Thus, it follows from Kirchhoff's voltage law that 

3i=6 
from which we conclude that the current is } = 2 4. Since I is positive, the direction assigned to the current 
flow is correct. 


EXAMPLE 4 A Circuit with Three Closed Loops 


Determine the currents /;, /, and / in the circuit shown in Figure 1.8.9. 


+ B + 
50V 30 V 


Figure 1.8.9 


Solution Using the assigned directions for the currents, Kirchhoff s current law provides one equation for 


each node: 
Node Current In Current Out 
A ip+f2 0 = f3 
B in = ip+dg 


However, these equations are really the same, since both can be expressed as 


iy +ig-—ig=0 (2) 


Gustav Kirchhoff (1824-1887) 


Historical Note The German physicist Gustav Kirchhoff was a student of Gauss. His work on 
Kirchhoff's laws, announced in 1854, was a major advance in the calculation of currents, voltages, 
and resistances of electrical circuits. Kirchhoff was severely disabled and spent most of his life on 
crutches or in a wheelchair. 

Image: © SSPL/The Image Works] 


To find unique values for the currents we will need two more equations, which we will obtain from 
Kirchhoff's voltage law. We can see from the network diagram that there are three closed loops, a left inner 
loop containing the 50 V battery, a right inner loop containing the 30 V battery, and an outer loop that 
contains both batteries. Thus, Kirchhoff's voltage law will actually produce three equations. With a 
clockwise traversal of the loops, the voltage rises and drops in these loops are as follows: 


Voltage Rises Voltage Drops 


Left Inside Loop 50 521 + 20/3 
Right Inside Loop 30+ 10/2 + 20/3 0 
Outside Loop 30 + 50 + 1072 Sf 
These conditions can be rewritten as 
Si + 20/3 = 50 
10/2-+ 20/3 = —30 (3) 
521, — 109 = 80 


However, the last equation is superfluous, since it is the difference of the first two. Thus, if we combine 2 
and the first two equations in 3, we obtain the following linear system of three equations in the three 
unknown currents: 


i+ f- fp o= 0 
5ii +20/; = 50 
10/3+ 20/3 = —30 


We leave it for you to solve this system and show that 7} = 6 A, /3 = —5 A, and /3z = 1A. The fact that /4 
is negative tells us that the direction of this current is opposite to that indicated in Figure 1.8.9. 


Balancing Chemical Equations 


Chemical compounds are represented by chemical formulas that describe the atomic makeup of their molecules. For 
example, water is composed of two hydrogen atoms and one oxygen atom, so its chemical formula is H2O; and stable 
oxygen is composed of two oxygen atoms, so its chemical formula is Oo. 


When chemical compounds are combined under the right conditions, the atoms in their molecules rearrange to form new 
compounds. For example, when methane burns, the methane (CH4) and stable oxygen (O2) react to form carbon dioxide 
(CO2) and water (H20). This is indicated by the chemical equation 


CHyg + 03 CO 4+ H20 (4) 


The molecules to the left of the arrow are called the reactants and those to the right the products. In this equation the plus 
signs serve to separate the molecules and are not intended as algebraic operations. However, this equation does not tell the 
whole story, since it fails to account for the proportions of molecules required for a complete reaction (no reactants left 
over). For example, we can see from the right side of 4 that to produce one molecule of carbon dioxide and one molecule 
of water, one needs three oxygen atoms for each carbon atom. However, from the left side of 4 we see that one molecule of 
methane and one molecule of stable oxygen have only two oxygen atoms for each carbon atom. Thus, on the reactant side 
the ratio of methane to stable oxygen cannot be one-to-one in a complete reaction. 


A chemical equation is said to be balanced if for each type of atom in the reaction, the same number of atoms appears on 
each side of the arrow. For example, the balanced version of Equation 4 is 


CHy + 203 + CO + 2H30 (5) 


by which we mean that one methane molecule combines with two stable oxygen molecules to produce one carbon dioxide 
molecule and two water molecules. In theory, one could multiply this equation through by any positive integer. For 
example, multiplying through by 2 yields the balanced chemical equation 

2CHy + 403 — 2003 + 4H30 


However, the standard convention is to use the smallest positive integers that will balance the equation. 


Equation 4 is sufficiently simple that it could have been balanced by trial and error, but for more complicated chemical 
equations we will need a systematic method. There are various methods that can be used, but we will give one that uses 
systems of linear equations. To illustrate the method let us reexamine Equation 4. To balance this equation we must find 
positive integers, x1, x3, x3, and x4 such that 


x1(CH4) + x2(O2) — x3(CO3) + x4(H20) (6) 


For each of the atoms in the equation, the number of atoms on the left must be equal to the number of atoms on the right. 
Expressing this in tabular form we have 


Left Side Right Side 


Carbon aa = *3 
Hydrogen 4x} = 2x4 
Oxygen 2x2 = 2x3+%x4 
from which we obtain the homogeneous linear system 
x4 = x3 = 
4x = 2x4=0 


2x27—2x%3— x4=0 


The augmented matrix for this system is 


10-1 00 


40 0 =2 0 
02 -2 =-1 0 
We leave it for you to show that the reduced row echelon form of this matrix is 

1 

10 0 5 0 

010 =-1 0 
ul 

00 1 > 0 


from which we conclude that the general solution of the system is 
xy=il2, xg=t, xg=t/2, xq=t 


where ¢ is arbitrary. The smallest positive integer values for the unknowns occur when we let ¢ — 2, so the equation can be 
balanced by letting x; = 1, x3 =2, x3=1, x4= 2. This agrees with our earlier conclusions, since substituting these 
values into Equation 6 yields Equation 5. 


EXAMPLE 5 Balancing Chemical Equations Using Linear Systems 


Balance the chemical equation 
HCl ++ Na3P04 — H3P04 + NaCl 
[hydrochloric acid] + [sodium phosphate] — [phosphoric acid] + [sodium chloride] 


Solution Letx,, x2, x3, and *4 be positive integers that balance the equation 
xi (HCl) + x3(Na3P 04) — x3(H3P 04) + xq(NaCh (7) 


Equating the number of atoms of each type on the two sides yields 
lx; =3x3 Hydrogen(H) 
lx; = 1x4 Chlorine(C]) 
3x3= 1x4 Sodium(Na) 
1lxz= 1x3 Phosphorous(P) 
4x3=4x3 Oxygen(O) 


from which we obtain the homogeneous linear system 


x, =—3x3 =0 
x4 =—x4=0 
3x3 =—x4=0 
xX2= X3 =0 
4xy—4x3 =0 
We leave it for you to show that the reduced row echelon form of the augmented matrix for this system is 
100 -1 0 
010-5 0 
001-40 
00 0 0 0 
00 0 0 0 


from which we conclude that the general solution of the system is 


xy=t, xg=£/3, xg=t/3, xq=t 
where ¢ is arbitrary. To obtain the smallest positive integers that balance the equation, we let ¢ — 3, in which 
case we obtainx; = 3, x3=1, x3 = 1, and x4 = 3. Substituting these values in 7 produces the balanced 
equation 


3HCl + NaPO4— H3PO4+ 3NaCl 


Polynomial Interpolation 


An important problem in various applications is to find a polynomial whose graph passes through a specified set of points 
in the plane; this is called an interpolating polynomial for the points. The simplest example of such a problem is to find a 
linear polynomial 


p(x) =ax+ (8) 


whose graph passes through two known distinct points, (x;, y 4) and (x3, 32), in the xy-plane (Figure 1.8.10). You have 
probably encountered various methods in analytic geometry for finding the equation of a line through two points, but here 
we will give a method based on linear systems that can be adapted to general polynomial interpolation. 


y 
y=ax+b 


Figure 1.8.10 


The graph of 8 is the line y = gx + , and for this line to pass through the points (x1, y 1) and (x3, 2), we must have 
yy =axy+d and pp =axg+h 
Therefore, the unknown coefficients a and b can be obtained by solving the linear system 
ax; +b=y1 
axz+b=y2 
We don't need any fancy methods to solve this system—the value of a can be obtained by subtracting the equations to 


eliminate 5, and then the value of a can be substituted into either equation to find b. We leave it as an exercise for you to 
find a and b and then show that they can be expressed in the form 


MIEN. ong 6a TI 


? Se 


x2 —%1 x2—*1 (9) 
provided x1 # X32. Thus, for example, the line » — gx 4. 4 that passes through the points 
(2,1) and (5,4) 
can be obtained by taking (x1, yy) = (2, 1) and (x, 3) = (5, 4), in which case 9 yields 
4=1 =1 and p= VO)- AME) _ =| 
5 


leak on = 
Therefore, the equation of the line is 
y=x=-1 


(Figure 1.8.11). 


Figure 1.8.11 


Now let us consider the more general problem of finding a polynomial whose graph passes through n points with distinct 
x-coordinates 


(*1, ¥1), (%2, ¥2), (43, ¥3), --+ Ons Yn) (10) 
Since there are 7 conditions to be satisfied, intuition suggests that we should begin by looking for a polynomial of the form 
— 2 n—-1 
p(x) =ag + apx + ax" +...+ay-1% (11) 
since a polynomial of this form has n coefficients that are at our disposal to satisfy the n conditions. However, we want to 
allow for cases where the points may lie on a line or have some other configuration that would make it possible to use a 
polynomial whose degree is less than , — 1; thus, we allow for the possibility that @,—1 and other coefficients in 11 may 


be zero. 


The following theorem, which we will prove later in the text, is the basic result on polynomial interpolation. 


THEOREM 1.8.1 Polynomial Interpolation 


Given any n points in the xy-plane that have distinct x-coordinates, there is a unique polynomial of degree n — | 
or less whose graph passes through those points. 


Let us now consider how we might go about finding the interpolating polynomial 11 whose graph passes through the points 
in 10. Since the graph of this polynomial is the graph of the equation 


y=agtayx anx? F...4 fig ax (12) 


it follows that the coordinates of the points must satisfy 


ag + ajxi 4 aox? F...4 a =y 
2 n—-1 

ag + ayx9+49x%5 +...+4y-1%3 = 2 iia) 
2 n—-1 _ 

AQ + A1Xy + AQX%y +... Ay-1%,_ = Yy, 


In these equations the values of x's and y's are assumed to be known, so we can view this as a linear system in the 
unknowns @g, @1, --.. @,—1- From this point of view the augmented matrix for the system is 


”? 


1 x x ce Fi v1 

2 -1 
1 x2 x5 ... x} ¥2 (14) 
1 x» --- a. Yn 


and hence the interpolating polynomial can be found by reducing this matrix to reduced row echelon form (Gauss-Jordan 
elimination). 


EXAMPLE 6 Polynomial Interpolation by Gauss-Jordan Elimination 


Find a cubic polynomial whose graph passes through the points 
(1,3), (2, =-2), ©, =—5), (4,9) 


Solution Since there are four points, we will use an interpolating polynomial of degree , — 3. Denote this 
polynomial by 
p(x) =ag + ayx4 anx? + ayx? 
and denote the x- and y-coordinates of the given points by 
xy=l, xg=2, x3=3, x4=4 and yy =3, yg= —2, yg = —5, yg =D 


Thus, it follows from 14 that the augmented matrix for the linear system in the unknowns ag, @1, @3, and @3 
is 


1 ay xf at 
_ 1111 3 
1 x2 %2 %2 y2]} 1/12 4 8 -2 
1 x3 x2 x2 3 13 9 27 -5 
14 16 64 0 


x4 x4 x4 v4 


We leave it for you to confirm that the reduced row echelon form of this matrix is 


1000 4 
09100 3 
0010 <5 
0001 1 
from which it follows that ag =4, a, = 3, @3 = —5, a3 = 1. Thus, the interpolating polynomial is 


p(x) =44+3x— 5x7 ex? 
The graph of this polynomial and the given points are shown in Figure 1.8.12. 


y 


Figure 1.8.12 


Remark Later we will give a more efficient method for finding interpolating polynomials that is better suited for 
problems in which the number of data points is large. 


CALCULUS AND CALCULATING UTILITY REQUIRED 


EXAMPLE 7 Approximate Integration <@ 


1 2 
sin| =*— | dx 
ho) 


directly since there is no way to express an antiderivative of the integrand in terms of elementary functions. 
This integral could be approximated by Simpson's rule or some comparable method, but an alternative 
approach is to approximate the integrand by an interpolating polynomial and integrate the approximating 
polynomial. For example, let us consider the five points 


xo=0, xy=0.25, x9=0.5, x3=0.75, xq=1 


that divide the interval [0, 1] into four equally spaced subintervals. The values of 


f(x) =sin [=| 


There is no way to evaluate the integral 


at these points are approximately 
F(H=0, f£(0.25)=0.098017, (0.5) =0.382683, f(0.75)=0.77301, ~C1)=1 
The interpolating polynomial is (verify) 


p(x) =0.098796x + 0.762356x? + 2.14429x7 — 2.00544x4 (15) 
and 
1 
7 p(x) dx 0.438501 (16) 
QO 


As shown in Figure 1.8.13, the graphs of fand p match very closely over the interval [0, 1], so the 
approximation is quite good. 


0.25 0.5 0.75 1 1.25 


pla) 
——— sin (m27/2) 


Figure 1.8.13 


Concept Review 
e Network 

e Branches 

° Nodes 

e Flow conservation 


e Electrical circuits: battery, resistor, poles (positive and negative), electrical potential, Ohm's law, Kirchhoff's 
current law, Kirchhoff's voltage law 


e Chemical equations: reactants, products, balanced equation 

e Interpolating polynomial 

Skills 

e Find the flow rates and directions of flow in branches of a network. 

e Find the amount of current flowing through parts of an electrical circuit. 
e Write a balanced chemical equation for a given chemical reaction. 


e Find an interpolating polynomial for a graph passing through a given collection of points. 


Exercise Set 1.8 


1. The accompanying figure shows a network in which the flow rate and direction of flow in certain branches are known. 
Find the flow rates and directions of flow in the remaining branches. 


50 


30 60 


50 


40 
Figure Ex-1 
Answer: 

50 

40 10 

30 60 

10 50 

40 


2. The accompanying figure shows known flow rates of hydrocarbons into and out of a network of pipes at an oil refinery. 


(a) Set up a linear system whose solution provides the unknown flow rates. 


(b) Solve the system for the unknown flow rates. 
(c) Find the flow rates and directions of flow if x4 = 50 and xg = 0. 


200 


Figure Ex-2 


. The accompanying figure shows a network of one-way streets with traffic flowing in the directions indicated. The flow 


rates along the streets are measured as the average number of vehicles per hour. 
(a) Set up a linear system whose solution provides the unknown flow rates. 
(b) Solve the system for the unknown flow rates. 


(c) If the flow along the road from A to B must be reduced for construction, what is the minimum flow that is required 
to keep traffic flowing on all roads? 


Figure Ex-3 


Answer: 


(a) *3—%4= —500, —x1 +x4= 100, x1 —x2= 300, x2 -—x3 = 100 
(b) %1= —100+2, x2= —400 +24, x3= —500 +2, x4=2 


(c) For all rates to be nonnegative, we need ¢ = 500 cars per hour, sox; = 400, x3 = 100, x3=0, x4= 500 


. The accompanying figure shows a network of one-way streets with traffic flowing in the directions indicated. The flow 


rates along the streets are measured as the average number of vehicles per hour. 
(a) Set up a linear system whose solution provides the unknown flow rates. 
(b) Solve the system for the unknown flow rates. 


(c) Is it possible to close the road from A to B for construction and keep traffic flowing on the other streets? Explain. 


300 200 100 
A 
500 A *1 B® Y 600 
i > > © >» 
X3 Ax, v Ns 
400 450 
<—_—_¢-——— < 6¢ <« ot 
XG Xy 
"OA v 
350 600 400 
Figure Ex-4 


In Exercises 5—8, analyze the given electrical circuits by finding the unknown currents. 


5. 


1V+ 


202 


Answer: 


algalsalg= 5h, pete ih 


ht 

3V I, 
In Exercises 9-12, write a balanced equation for the given chemical reaction. 
9, C3Hg + O02 CO2+H320 (propane combustion) 


Answer: 

xy=1, x2=5, x3=3, andx4=4; the balanced equation is CzHg + 503 — 3CO2 + 4H20 
10. C6H120¢6 — CO + C2H508 ( fermentation of sugar) 
11. CH3COF + H20 — CH3;COOH + HF 


Answer: 

x1 =%2=x3=%x4=4; the balanced equation is CH3COF + HzO — CH3;COOH + HF 
12, CO2z + H20 — CgHy3204 + O32 ( photosynthesis) 
13. Find the quadratic polynomial whose graph passes through the points (1, 1), (2, 2), and (3, 5). 


Answer: 


p(x) =x? —2x 42 
14. Find the quadratic polynomial whose graph passes through the points (0, 0), (—1, 1), and (1, 1). 
15. Find the cubic polynomial whose graph passes through the points (—1, —1), (0, 1), (1, 3), (4, -1). 


Answer: 


ee ee 
pij=14 6% 6 


16. The accompanying figure shows the graph of a cubic polynomial. Find the polynomial. 


10 


123 4 5 6 7 8 
Figure Ex-16 
17. (a) Find an equation that represents the family of all second-degree polynomials that pass through the points (0, 1) and 


(1,2). [Hint: The equation will involve one arbitrary parameter that produces the members of the family when 
varied. } 


(b) By hand, or with the help of a graphing utility, sketch four curves in the family. 
Answer: 


(a) Using ay =X as a parameter, p(x) = 1+kx + (1— k)x? where —~o9 <k<= co. 
(b) The graphs fork = 0, 1, 2, and 3 are shown. 


k=3k=2 


18. In this section we have selected only a few applications of linear systems. Using the Internet as a search tool, try to find 
some more real-world applications of such systems. Select one that is of interest to you, and write a paragraph about it. 


True-False Exercises 


In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 
(a) In any network, the sum of the flows out of a node must equal the sum of the flows into a node. 
Answer: 


True 


(b) When a current passes through a resistor, there is an increase in the electrical potential in a circuit. 
Answer: 


False 


(c) Kirchhoff's current law states that the sum of the currents flowing into a node equals the sum of the currents flowing out 
of the node. 


Answer: 


True 


(d) A chemcial equation is called balanced if the total number of atoms on each side of the equation is the same. 
Answer: 


False 


(e) Given any 7 points in the xy-plane, there is a unique polynomial of degree » — | or less whose graph passes through 
those points. 


Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


1.9 Leontief Input-Output Models 


In 1973 the economist Wassily Leontief was awarded the Nobel prize for his work on economic modeling in which he 
used matrix methods to study the relationships between different sectors in an economy. In this section we will discuss 
some of the ideas developed by Leontief. 


Inputs and Outputs in an Economy 


One way to analyze an economy is to divide it into sectors and study how the sectors interact with one another. For 
example, a simple economy might be divided into three sectors—manufacturing, agriculture, and utilities. Typically, a 
sector will produce certain outputs but will require inputs from the other sectors and itself. For example, the agricultural 
sector may produce wheat as an output but will require inputs of farm machinery from the manufacturing sector, 
electrical power from the utilities sector, and food from its own sector to feed its workers. Thus, we can imagine an 
economy to be a network in which inputs and outputs flow in and out of the sectors; the study of such flows is called 
input-output analysis. Inputs and outputs are commonly measured in monetary units (dollars or millions of dollars, for 
example) but other units of measurement are also possible. 


The flows between sectors of a real economy are not always obvious. For example, in World War II the United States had 
a demand for 50,000 new airplanes that required the construction of many new aluminum manufacturing plants. This 
produced an unexpectedly large demand for certain copper electrical components, which in turn produced a copper 
shortage. The problem was eventually resolved by using silver borrowed from Fort Knox as a copper substitute. In all 
likelihood modern input-output analysis would have anticipated the copper shortage. 


Most sectors of an economy will produce outputs, but there may exist sectors that consume outputs without producing 
anything themselves (the consumer market, for example). Those sectors that do not produce outputs are called open 
sectors. Economies with no open sectors are called closed economies, and economies with one or more open sectors are 
called open economies (Figure 1.9.1). In this section we will be concerned with economies with one open sector, and our 
primary goal will be to determine the output levels that are required for the productive sectors to sustain themselves and 
satisfy the demand of the open sector. 


Manufacturing Agriculture 


Figure 1.9.1 


Leontief Model of an Open Economy 


Let us consider a simple open economy with one open sector and three product-producing sectors: manufacturing, 
agriculture, and utilities. Assume that inputs and outputs are measured in dollars and that the inputs required by the 


productive sectors to produce one dollar's worth of output are in accordance with Table 1. 
Table 1 


Income Required per Dollar Output 


Manufacturing | Agriculture| Utilities 


Manufacturing | $0.50 $ 0.10 
Provider] Agriculture $ 0.20 $ 0.50 
Utilities $ 0.10 $ 0.30 


Historical Note It is somewhat ironic that it was the Russian-born Wassily Leontief who won the Nobel prize 
in 1973 for pioneering the modern methods for analyzing free-market economies. Leontief was a precocious 
student who entered the University of Leningrad at age 15. Bothered by the intellectual restrictions of the Soviet 
system, he was put in jail for anti-Communist activities, after which he headed for the University of Berlin, 
receiving his Ph.D. there in 1928. He came to the United States in 1931, where he held professorships at Harvard 
and then New York University. 

[Image: © Bettmann/©Corbis] 


Usually, one would suppress the labeling and express this matrix as 


05 01 01 
C=/0.2 0.5 03 (1) 
0.1 03 04 
This is called the consumption matrix (or sometimes the technology matrix) for the economy. The column vectors 
0.5 0.1 0.1 
cy =|0.5], c2=]05], cz=]0.3 
0.1 0.3 0.4 


in C list the inputs required by the manufacturing, agricultural, and utilities sectors, respectively, to produce $1.00 worth 
of output. These are called the consumption vectors of the sectors. For example, c, tells us that to produce $1.00 worth of 
output the manufacturing sector needs $0.50 worth of manufacturing output, $0.20 worth of agricultural output, and 

$0.10 worth of utilities output. 


What is the economic significance of the row sums 
of the consumption matrix? 


Continuing with the above example, suppose that the open sector wants the economy to supply it manufactured goods, 
agricultural products, and utilities with dollar values: 


d, dollars of manufactured goods 
d7 dollars of agricultural products 
@3 dollars of utilities 


The column vector d that has these numbers as successive components is called the outside demand vector. Since the 
product-producing sectors consume some of their own output, the dollar value of their output must cover their own needs 
plus the outside demand. Suppose that the dollar values required to do this are 


x4 dollars of manufactured goods 
X32 dollars of agricultural products 
x3 dollars of utilities 


The column vector x that has these numbers as successive components is called the production vector for the economy. 
For the economy with consumption matrix 1, that portion of the production vector x that will be consumed by the three 
productive sectors is 


0.5 0.1 0.1 0.5 0.1 O1}f*1 
x4] 0.2 - x2} 0.5 t x3/0.3) =/0.2 05 03])/%2/=Cx 
0.1 0.3 0.4 0.1 0.3 0.4 |[*3 


Fractions Fractions Fractions 
consumed by consumed by consumed 


manufacturing agriculture by utilities 


The vector (x is called the intermediate demand vector for the economy. Once the intermediate demand is met, the 
portion of the production that is left to satisfy the outside demand is x — ¢*x. Thus, if the outside demand vector is d, then 


x must satisfy the equation 
x - Cx = d 
Amount Intermediate Outside 
produced demand demand 
which we will find convenient to rewrite as 
(=-C)x=d (2) 
The matrix 7 — (is called the Leontief matrix and 2 1s called the Leontief equation. 


EXAMPLE 1 Satisfying Outside Demand <@ 


Consider the economy described in Table 1. Suppose that the open sector has a demand for $7900 worth of 
manufacturing products, $3950 worth of agricultural products, and $1975 worth of utilities. 


(a) Can the economy meet this demand? 


(b) Ifso, find a production vector x that will meet it exactly. 


Solution The consumption matrix, production vector, and outside demand vector are 


05 01 01 XJ 7900 
C=/0.2 0.5 0.3], x=|%2], d=| 3950 (3) 
0.1 03 04 x3 1975 


To meet the outside demand, the vector x must satisfy the Leontief equation 2, so the problem reduces to 
solving the linear system 


05 =—0.1 =—O0.1} fx 7300 


-02 05 -03|/x2] _ |3950 : 
~01 -03 06 ||x3} = | 1975 (4) 
i-c x d 


(if consistent). We leave it for you to show that the reduced row echelon form of the augmented matrix for 
this system is 


100 27,500 
010 33,750 
001 24750 
This tells us that 4 is consistent, and the economy can satisfy the demand of the open sector exactly by 


producing $27,500 worth of manufacturing output, $33,750 worth of agricultural output, and $24,750 
worth of utilities output. 


Productive Open Economies 


In the preceding discussion we considered an open economy with three product-producing sectors; the same ideas apply 
to an open economy with n product-producing sectors. In this case, the consumption matrix, production vector, and 
outside demand vector have the form 


Cu C12 °° °° Cty x1 ay 
c c es é x 

e=|7 2 mn) x=|"?), a=|%? 
Ce Oa cs Bigg Xi dy 


where all entries are nonnegative and 
°i) = the monetary value of the output of the ith sector that is needed by the jth sector to produce one unit of output 
*1 = the monetary value of the output of the ith sector 


di =the monetary value of the output of the ith sector that is required to meet the demand of the open sector 


Remark Note that the jth column vector of C contains the monetary values that the jth sector requires of the other 
sectors to produce one monetary unit of output, and the ith row vector of C contains the monetary values required of the 
ith sector by the other sectors for each of them to produce one monetary unit of output. 


As discussed in our example above, a production vector x that meets the demand d of the outside sector must satisfy the 
Leontief equation 


i=-C)x=d 


If the matrix 7 — is invertible, then this equation has the unique solution 

x=(i-C)d (5) 
for every demand vector d. However, for x to be a valid production vector it must have nonnegative entries, so the 
problem of importance in economics is to determine conditions under which the Leontief equation has a solution with 


nonnegative entries. 


It is evident from the form of 5 that if 7 — ¢ is invertible, and if (7 — C’) —l has non-negative entries, then for every 


demand vector d the corresponding x will also have non-negative entries, and hence will be a valid production vector for 
the economy. Economies for which (/ = C’} —! has nonnegative entries are said to be productive. Such economies are 


desirable because demand can always be met by some level of production. The following theorem, whose proof can be 
found in many books on economics, gives conditions under which open economies are productive. 


THEOREM 1.9.1 


If C is the consumption matrix for an open economy, and if all of the column sums are less than then the matrix 
7 — Cis invertible, the entries of (7 — C’) —! are nonnegative, and the economy is productive. 


Remark The jth column sum of C represents the total dollar value of input that the jth sector requires to produce $1 of 
output, so if the jth column sum is less than 1, then the jth sector requires less than $1 of input to produce $1 of output; in 
this case we say that the jth sector is profitable. Thus, Theorem 1.9.1 states that if all product-producing sectors of an 
open economy are profitable, then the economy is productive. In the exercises we will ask you to show that an open 
economy is productive if all of the row sums of C are less than 1 (Exercise 11). Thus, an open economy is productive if 
either all of the column sums or all of the row sums of C are less than 1. 


EXAMPLE 2 An Open Economy Whose Sectors Are All Profitable 


The column sums of the consumption matrix C in | are less than 1, so (7 = C’) — exists and has nonnegative 


entries. Use a calculating utility to confirm this, and use this inverse to solve Equation 4 in Example 1. 


Solution We leave it for you to show that 


2.65823 1.13924 1.01266 
(i-C) | =| 1.89873 3.67089 2.15190 
1.39241 2.02532 2.91139 


This matrix has nonnegative entries, and 


2.65823 1.13924 1.01266][7900] | 27. 590 
x= (J—C)— d=] 1.89873 3.67089 2.15190 || 3950 | = | 33, 750 
1.39241 2.02532 2.91139 |] 1975] | 24.750 


which is consistent with the solution in Example 1. 


Concept Review 

e Sectors 

° Inputs 

° Outputs 

e Input-output analysis 
e Open sector 


e Economies: open, closed 


Consumption (technology) matrix 


Consumption vector 


Outside demand vector 


Production vector 


Intermediate demand vector 


Leontief matrix 


Leontief equation 


Skills 
¢ Construct a consumption matrix for an economy. 


e Understand the relationships among the vectors of a sector of an economy: consumption, outside demand, 
production, and intermediate demand. 


Exercise Set 1.9 


1. 


N 


im) 


An automobile mechanic (M) and a body shop (B) use each other's services. For each $1.00 of business that M does, it 
uses $0.50 of its own services and $0.25 of B's services, and for each $1.00 of business that B does it uses $0.10 of its 
own services and $0.25 of M's services. 


(a) Construct a consumption matrix for this economy. 


(b) How much must M and B each produce to provide customers with $7000 worth of mechanical work and $14,000 
worth of body work? 


Answer: 


(a) [0.50 0.25 
0.25 0.10 


(b) [ $ 25, 290 
$ 22, 581 


. Asimple economy produces food (F) and housing (H). The production of $1.00 worth of food requires $0.30 worth of 


food and $0. 10 worth of housing, and the production of $1.00 worth of housing requires $0.20 worth of food and 
$0.60 worth of housing. 


(a) Construct a consumption matrix for this economy. 


(b) What dollar value of food and housing must be produced for the economy to provide consumers $130,000 worth 
of food and $130,000 worth of housing? 


. Consider the open economy described by the accompanying table, where the input is in dollars needed for $1.00 of 


output. 
(a) Find the consumption matrix for the economy. 


(b) Suppose that the open sector has a demand for $1930 worth of housing, $3860 worth of food, and $5790 worth of 
utilities. Use row reduction to find a production vector that will meet this demand exactly. 


Table Ex-3 


Income Required per Dollar Output 


Housing Food Utilities 


Housing| $0.10 $ 0.60 $ 0.40 
Provider | Food $ 0.30 $ 0.20 $ 0.30 


Utilities | $0.40 $ 0.10 $ 0.20 


Answer: 


(a) [0.1 06 04 
0.3 0.2 03 
0.4 0.1 02 


(b) [ $31, 500 
$ 26, 500 
$ 26, 300 


4. A company produces Web design, software, and networking services. View the company as an open economy 
described by the accompanying table, where input is in dollars needed for $1.00 of output. 


(a) Find the consumption matrix for the company. 


(b) Suppose that the customers (the open sector) have a demand for $5400 worth of Web design, $2700 worth of 
software, and $900 worth of networking. Use row reduction to find a production vector that will meet this demand 
exactly. 


Table Ex-4 


Income Required per Dollar Output 


Web Design| Software| Networking 


Web Design| $ 0.40 $ 0.20 


Provider| Software $ 0.30 $ 0.35 
Networking | $0.15 $0.10 


In Exercises 5—6, use matrix inversion to find the production vector x that meets the demand d for the consumption 
matrix C. 


Boe 1 1. S|, 5 | 50 
c=[)5 a a=|*| 
Answer: 

123.08 
202.56 

Gin 10S U1 |. | ee 

c=|93 He a=|%,| 


7. Consider an open economy with consumption matrix 


(a) Showthat the economy can meet a demand of @, = 2 units from the first sector and @3 = 0 units from the second 
sector, but it cannot meet a demand of @ = 2 units from the first sector and @3 = 1 unit from the second sector. 


(b) Give both a mathematical and an economic explanation of the result in part (a). 


8. Consider an open economy with consumption matrix 


Pl Pole Pole 
| cafe BI 


If the open sector demands the same dollar value from each product-producing sector, which such sector must 
produce the greatest dollar value to meet the demand? 


9. Consider an open economy with consumption matrix 
C11 ©12 
cz, 0 
Show that the Leontief equation x — Cx = d has a unique solution for every demand vector d if ca,¢43 < 1 —c 1. 


10. (a) Consider an open economy with a consumption matrix C whose column sums are less than 1, and let x be the 
production vector that satisfies an outside demand d; that is, (¢ — C)~'d =x. Let d j be the demand vector that is 


obtained by increasing the jth entry of d by 1 and leaving the other entries fixed. Prove that the production vector 
x} that meets this demand is 


x;=x-+ jth column vector of (/ — C) = 
(b) In words, what is the economic significance of the jth column vector of (7 = C’) ty [Hint: Look at Xj = X.] 
11. Prove: If C is an »z 5¢ 2, matrix whose entries are nonnegative and whose row sums are less than 1, then 7 — (7 is 


| T 
invertible and has nonnegative entries. [Hint: (47) = (4 *) for any invertible matrix A.] 


True-False Exercises 
In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 
(a) Sectors of an economy that produce outputs are called open sectors. 

Answer: 


False 


(b) A closed economy is an economy that has no open sectors. 
Answer: 


True 


(c) The rows of a consumption matrix represent the outputs in a sector of an economy. 
Answer: 


False 


(d) If the column sums of the consumption matrix are all less than 1, then the Leontif matrix is invertible. 
Answer: 


True 


(e) The Leontif equation relates the production vector for an economy to the outside demand vector. 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


Chapter 1 Supplementary Exercises 


In Exercises 1-4 the given matrix represents an augmented matrix for a linear system. Write the 
corresponding set of linear equations for the system, and use Gaussian elimination to solve the linear system. 
Introduce free parameters as necessary. 


1.}3 =—1 0 4 1 
2 033 =1 
Answer 
3x, — X32 + x4 = 1 
2x; t 3x3 + 3x4 = =—1 
ee ae ee ee - 
so ae af 5 x2 56 nf 5° X3=S, X48 
2 1 4-1 
—2 -8 2 
3 12 =3 
0 oO 90 
3 2 =—4 1 6 
—4 0 -1 
0 1 —1 
Answer: 
2x, — 4x2 + x3 = 6 
—4x1 - 3x3 = =1 
x27 = x3 = 3 
eae eee ee ee 
x,= 57 *2= 37 %3 3 
4 3 1 =—2 
-9 -3 6 
6 2 1 
5. Use Gauss—Jordan elimination to solve for x’ and y’ in terms of x and y. 
_ 3.1 _ 48 
Ss Se 
ee 
= 5” 5) 
Answer: 
1344, yi 43 
eee ey os 


6. Use Gauss—Jordan elimination to solve for x’ and y’ in terms of x and y. 


x=x'cosO—y'sin# 
y =x'sin@ — y'cos 6 
7. Find positive integers that satisfy 
x+ yt z= 9 
x + 5y + 10z = 44 


Answer: 


c= 4) pS 2 SS 
8. A box containing pennies, nickels, and dimes has 13 coins with a total value of 83 cents. How many coins 
of each type are in the box? 
9. Let 
= 
4 
b 


0 
a 
a 


ORR 


S moh & 


be the augmented matrix for a linear system. Find for what values of a and b the system has 
(a) a unique solution. 

(b) a one-parameter solution. 

(c) a two-parameter solution. 


(d) no solution. 
Answer: 


(a) @#0, b#2 
(b) @#9, B=2 
(c) @=9, b=2 
(d) @=90, b#2 
10. For which value(s) of a does the following system have zero solutions? One solution? Infinitely many 
solutions? 
X1#2%2+%3=4 
is=2 
(a? —4)x3 =a=—2 


11. Find a matrix K such that 4% 2 — ct given that 


A=|-2. 31, a-(5 : A 
i =? -_ 
8 6 =6 


Answer: 


O02 
oak) 
12. How should the coefficients a, b, and c be chosen so that the system 
ax + by —3z= =—3 
—2x —by +ez= = 1 
ax + 3y—ez= =3 
has the solution; = 1, y= —1,andz= 2? 


13. In each part, solve the matrix equation for X. 


(a) =—1 0 1 
12 0 
4 ie ee ee -|_3 1 | 


31-1 
(b) y[1 -1 2]_[-5 -1 0 
x cee | 


© JE a 


Answer: 
(a) a —1 3 —!1 
60 1 
(b) Bs 1 —2 
E 1 
(c) _113 _ 160 
37 37 
=) 00: ae 
37 37 


14. Let A be a square matrix. 
(a) Show that (7 — A)! =/4 A+ A? 4 A? if 440. 
(b) Show that 
-A) 1 =/4A4 A? 4...4 4” 
if avt! _ 9. 
15. Find values of a, b, and c such that the graph of the polynomial p(x) = ax? ++ bx + ¢ passes through the 
points (1, 2), (—1, 6), and (2, 3). 
Answer: 


a=l, 4==-2,¢=3 
16. (Calculus required) Find values of a, b, and c such that the graph of the polynomial 


17. 


18. 


19. 
20. 
21. 


22. 


23. 


24. 


p(x) = ax?+bx+e passes through the point (—1, 0) and has a horizontal tangent at (2, —9). 
Let J, be the » x » matrix each of whose entries is 1. Show that if, = 1, then 


(Jn) =1-—1 Jy 


Show that if a square matrix A satisfies 
AP 4447-24471 =0 
then so does 4. 


Prove: If B is invertible, then 48~! — p—1 4 if and only if 43 — BA. 
Prove: If A is invertible, then 4+ 3 andj 4 BA —! are both invertible or both not invertible. 
Prove: If A is an j; sx 9, matrix and B is the » x | matrix each of whose entries is 1/n, then 
ry 
AB =|"? 
Pm 


where ?; is the average of the entries in the ith row of A. 


(Calculus required) If the entries of the matrix 
ey(x) cy2(x) t+ eyy (x) 
cu} c2) e220) +++ Canlx) 
Emi (X) Cm2(X) + + + Cmm(x) 


are differentiable functions of x, then we define 


cin(x) cfia(x) + efin(x) 
dC _|e’an(x) e’n(x) +++ e'a,(x) 
ax : : ; : 

o! mi (x) o! ma (x) maveiy e  sosfX) 


Show that if the entries in A and B are differentiable functions of x and the sizes of the matrices are such 


that the stated operations can be performed, then 


a _,-a@A 
() 4 (kA) =k 


(b) 4 ¢44p)—-24, @2 
He ae ae 


() (4p) 44g 4 498 
ax 9) fee ae 


(Calculus required) Use part (c) of Exercise 22 to show that 


dA! sige gi 
= =-A SA 
ax ax 
State all the assumptions you make in obtaining this formula. 


Assuming that the stated inverses exist, prove the following equalities. 


(a) (co if p4\" =C(C+D)"p 
) @+cD)'c=cU+Dc)71 


(c) (c ‘ ppt) "D = ctl + oa 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


Determinants 


| CHAPTER 


CHAPTER CONTENTS 


2.1. Determinants by Cofactor Expansion 
2.2. Evaluating Determinants by Row Reduction 


2.3. Properties of Determinants; Cramer's Rule 


INTRODUCTION 


In this chapter we will study “determinants” or, more precisely, “determinant functions.” 
Unlike real-valued functions, such as f (x) = x“, that assign a real number to a real 


variable x, determinant functions assign a real number 7 (.4) to a matrix variable A. 
Although determinants first arose in the context of solving systems of linear equations, 
they are no longer used for that purpose in real-world applications. Although they can be 
useful for solving very small linear systems (say two or three unknowns), our main 
interest in them stems from the fact that they link together various concepts in linear 
algebra and provide a useful formula for the inverse of a matrix. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


2.1 Determinants by Cofactor Expansion 


In this section we will define the notion of a “determinant.” This will enable us to give a specific formula for the inverse of an 
invertible matrix, whereas up to now we have had only a computational procedure for finding it. This, in turn, will eventually 
provide us with a formula for solutions of certain kinds of linear systems. 


Recall from Theorem 1.4.5 that the 2 sx 2 matrix 


WARNING 


It is important to keep in mind that det(_A) is a number, 
whereas A is a matrix. 


is invertible if and only if g@ — be « 0 and that the expression g@ — de is called the determinant of the matrix A. Recall also 
that this determinant is denoted by writing 


det(A) =ad —be or a =ad —be (1) 


ab | 
and that the inverse of A can be expressed in terms of the determinant as 


—1 1 d —b 
A ay i | (2) 


Minors and Cofactors 


One of our main goals in this chapter is to obtain an analog of Formula 2 that is applicable to square matrices of all orders. For 
this purpose we will find it convenient to use subscripted entries when writing matrices or determinants. Thus, if we denote a 


2 x 2 matrix as 
411 &12 
A= 
e& a 
then the two equations in | take the form 


1 


aj, a2 
det(A) = an aoa) = anana — azn (3) 


We define the determinant of a ] x ] matrix A= [a4] 
as det[.A] = det[ay,] =a 


The following definition will be key to our goal of extending the definition of a determinant to higher order matrices. 


DEFINITION 1 


If A is a square matrix, then the minor of entry ij is denoted by 4M; j and is defined to be the determinant of the 
submatrix that remains after the ith row and jth column are deleted from A. The number ( — it M ij is denoted by 


Cj and is called the cofactor of entry ij. 


EXAMPLE 1 Finding Minors and Cofactors 


Let 


WARNING 


We have followed the standard convention of 
using capital letters to denote minors and cofactors 
even though they are numbers, not matrices. 


The minor of entry 411 is 


My, = 


The cofactor of @4j is 


Similarly, the minor of entry 4332 is 


The cofactor of @33 is 


Cy =(- 197 My = — My = — 26 


Historical Note The term determinant was first introduced by the German mathematician Carl Friedrich 
Gauss in 1801 (see p. 15), who used them to “determine” properties of certain kinds of functions. 
Interestingly, the term matrix is derived from a Latin word for “womb” because it was viewed as a container 
of determinants. 


Historical Note The term minor is apparently due to the English mathematician James Sylvester (see p. 
34), who wrote the following in a paper published in 1850: “Now conceive any one line and any one column 
be struck out, we get... a square, one term less in breadth and depth than the original square; and by varying 
in every possible selection of the line and column excluded, we obtain, supposing the original square to 
consist of n lines and n columns, 2 such minor squares, each of which will represent what I term a “First 


Minor Determinant” relative to the principal or complete determinant.” 


Remark Note that a minor A4; j and its corresponding cofactor C ij are either the same or negatives of each other and that the 


relating si — 1)'*) is either | or —] in accordance with the pattern in the “checkerboard” arra 
g sign + 1 p y 


| | 
+ | 


ot iti 


For example, 

Cu=Mi, Ca=—-Mn, Cn=Mn 
and so forth. Thus, it is never really necessary to calculate ( — 1) +} to calculate Cyj;—you can simply compute the minor A¢. ij 
and then adjust the sign in accordance with the checkerboard pattern. Try this in Example 1. 


EXAMPLE 2 Cofactor Expansions of a2x2Matrix 


The checkerboard pattern for a 2 x 2 matrix A = [a@;,] is 


so that 
Cy = My =a Ci2= —My2= -—a71 
Cy = —-My=-a12 Cy=Myn=ay 


We leave it for you to use Formula 3 to verify that det(.4) can be expressed in terms of cofactors in the following 
four ways: 


411 @12 


det(A) = lan an 


= 41011 +4 12C 12 4) 
=a71C21 +anC2 
= 4101, +4210] 
= 442012 +4720 22 


Each of last four equations is called a cofactor expansion of det[.A]. In each cofactor expansion the entries and 
cofactors all come from the same row or same column of A. For example, in the first equation the entries and 
cofactors all come from the first row of A, in the second they all come from the second row of A, in the third they all 
come from the first column of A, and in the fourth they all come from the second column of A. 


Definition of a General Determinant 


Formula 4 is a special case of the following general result, which we will state without proof. 


THEOREM 2.1.1 


If A is an 92 s¢ 4 matrix, then regardless of which row or column of A is chosen, the number obtained by multiplying the 
entries in that row or column by the corresponding cofactors and adding the resulting products is always the same. 


This result allows us to make the following definition. 


DEFINITION 2 


If A is an jg 5¢ matrix, then the number obtained by multiplying the entries in any row or column of A by the 
corresponding cofactors and adding the resulting products is called the determinant of A, and the sums themselves are 
called cofactor expansions of A. That is, 


det(A) = ayj3Cy; + a93C 25 +... + ayy Cyy es 
[cofactor expansion along the jth column] 
and 
det(A) = ayy Cy, +4202 +... + ain in (6) 


[cofactor expansion along the ith row] 


EXAMPLE 3 Cofactor Expansion Along the FirstRow 


Find the determinant of the matrix 


3 1 0 
A=|-2 -4 3 
5 4 =2 


by cofactor expansion along the first row. 


Solution 
3 1 0 
—4 3 —2 3 —2 —4 
det(A)=|—2 =—4 3] = j - | | y | 
5 4-9 4 —2 5 =—2 5 4 


= 3(-4)-(1)(-11) +0=-1 


EXAMPLE 4 Cofactor Expansion Along the First Column 


Let A be the matrix in Example 3, and evaluate det(_4) by cofactor expansion along the first column of A. 


Solution 
3 1 0 
—4 3 1 0 10 
det(A) = 7 > = 5 4 3|--2 +34 | 


= 3(=4)—(=2)(-2) +5(3)= = 1 


Note that in Example 4 we had to compute three 
cofactors, whereas in Example 3 only two were 
needed because the third was multiplied by zero. 
As atule, the best strategy for cofactor 
expansion is to expand along a row or column 
with the most zeros. 


This agrees with the result obtained in Example 3. 


Charles Lutwidge Dodgson (Lewis Carroll) (1832-1898) 


Historical Note Cofactor expansion is not the only method for expressing the determinant of a matrix 
in terms of determinants of lower order. For example, although it is not well known, the English 
mathematician Charles Dodgson, who was the author of Alice's Adventures in Wonderland and Through 
the Looking Glass under the pen name of Lewis Carroll, invented such a method, called “condensation.” 
That method has recently been resurrected from obscurity because of its suitability for parallel 
processing on computers. 

[Image: Time & Life Pictures/Getty Images, Inc.| 


EXAMPLE 5 Smart Choice of Rowor Column 


If A is the 4 x 4 matrix 


then to find det(A) it will be easiest to use cofactor expansion along the second column, since it has the most zeros: 
1 0 =1 
det(A)=1-]1 -2 1 
2 0 1 


For the 3 x 3 determinant, it will be easiest to use cofactor expansion along its second column, since it has the most 
Zeros: 


— a . 1 = 
det(A) = 1- —2 ; 1 
= =—2(1+2) 

= -6 


EXAMPLE 6 Determinant of an Upper Triangular Matrix << 


The following computation shows that the determinant of a 4 x 4 upper triangular matrix is the product of its 
diagonal entries. Each part of the computation uses a cofactor expansion along the first row. 


a0 0 0 
az; az727 9 
@31 @32 @33 0 


ay O 0 
=411]a32 233 9 


4427 a43 aa4 
a4, A242 443 24g 
433 
SNOT aay 


= 411499433|a44| = 2114720330 44 


The method illustrated in Example 6 can be easily adapted to prove the following general result. 


THEOREM 2.1.2 


If A is an » x » triangular matrix (upper triangular, lower triangular, or diagonal), then det(_A) is the product of the 


entries on the main diagonal of the matrix; that is, det(A) = @1ja@97-° * * @yp. 


A Useful Technique for Evaluating 2 x 2 and 3 x 3 Determinants 


Determinants of 2 x 2 and 3 x 3 matrices can be evaluated very efficiently using the pattern suggested in Figure 2.1.1. 


Figure 2.1.1 


In the 2 % 2 case, the determinant can be computed by forming the product of the entries on the rightward arrow and 
subtracting the product of the entries on the leftward arrow. In the 3 x 3 case we first recopy the first and second columns as 
shown in the figure, after which we can compute the determinant by summing the products of the entries on the rightward 
arrows and subtracting the products on the leftward arrows. These procedures execute the computations 


WARNING 


The arrow technique only works for determinants of 
2 x 2 and 3 sx 3 matrices. 


41 412] ng 
ag, ayy | = 211922 — 212221 
po = ca lan ax ay ay ay ay 
aL 22 O23 =411laaq 33] 7 12laz aa3| PF 3a aap 
a3, 432 433 


= 411 (499433 — 293432) — @12(421433 — 273431) + @13(421432 — 227431) 
= 411499493 + @ 12493431 + A 13421432 — 13422431 — 412421493 — 411473432 


which agrees with the cofactor expansions along the first row. 


EXAMPLE 7 A Technique for Evaluating 2 x 2 and 3 x 3 Determinants 


3 l : 
4-2 — So = (3)(—2) — (1)(4) = —10 


| 
+ 
wm 
a 

ll 


= [45 + 84 + 96] — [105 — 48 — 72] = 240 


Concept Review 
e Determinant 

e Minor 

° Cofactor 


e Cofactor expansion 


Skills 

e Find the minors and cofactors of a square matrix. 

e Use cofactor expansion to evaluate the determinant of a square matrix. 

¢ Use the arrow technique to evaluate the determinant of a 2? x 2 or 3 x% 3 matrix. 

e Use the determinant of a 2 x 2 invertible matrix to find the inverse of that matrix. 


e Find the determinant of an upper triangular, lower triangular, or diagonal matrix by inspection. 


Exercise Set 2.1 


In Exercises 1—2, find all the minors and cofactors of the matrix A. 


1 i 3 
A=| 6 7 =1 
=—3-1 4 


My, = 29, Cy, = 29 
Myj2=21, Cy2= -21 
My3=27, Cyz3=27 
My, = -11, Cz, =11 
My = 13, Cx = 13 
Ma3= -—5, C3 =5 
Ma, = -—19, C3,= —19 
Maz = -—19, C32= 19 
M33= 19, C33 = 19 


8 


ths 

I 
OW ee 
Ke We 
kaAN 


Find 

(a) 4413 and Cy3. 
(b) 4423 and C23. 
(c) A€22 and C22. 
(d) A421 and C, . 


Answer: 


(a) 413=0, Cy3=90 

(b) 4423 = — 96, C23 = 96 
(c) 422= —48, Ca = —48 
(d) 449, = 72, Cz, = -—72 


4. Let 


Find 

(a) 4432 and C2. 
(b) Mag and Cay. 
(c) Aa, and Cy . 
(d) 4424 and C24. 


In Exercises 5—8, evaluate the determinant of the given matrix. If the matrix is invertible, use Equation 2 to find its inverse. 


oF 


Answer: 


11 22 
6.|4 
8 
7.)/—-5 7 
—7 =2 
Answer: 
pects seine 
59 59 
59; 
Ml) eieheg oes 
59 59 


1V2 6 
ae 


In Exercises 9—14, use the arrow technique to evaluate the determinant of the given matrix. 


9.)}a—3 5 
—3 a-—2 
Answer: 
a? —5a+21 
10.}—-2 7 6 
5 1 =—2 
3 8 
11.) —2 1 
35 —7 
1 6 2 
Answer 
-—65 
12.}-1 1 2 
30 +5 
17 2 
13.}3 0 90 
2 —1 ) 
1 9 =—4 
Answer 
-—123 
14.}c =-4 3 
S-  Je 
4c-—1 2 


In Exercises 15—18, find all values of A for which det(_4) = 0. 


i a=[\73 | 


—5 A+4 
Answer: 
A=lor —3 

16. A—4 0 0 
A= 0 A 2 
0 3 A=—1 
17. ,_|A=1 0 
a=| 2 a 
Answer: 
A=lor —1 
18. A-—4 4 0 
A=| —1 A 0 
0 0 A—5 


19. Evaluate the determinant of the matrix in Exercise 13 by a cofactor expansion along 
(a) the first row. 
(b) the first column. 
(c) the second row. 
(d) the second column. 
(e) the third row. 
(f) the third column. 


Answer: 


(all parts) — 123 
20. Evaluate the determinant of the matrix in Exercise 12 by a cofactor expansion along 
(a) the first row. 
(b) the first column. 
(c) the second row. 
(d) the second column. 
(e) the third row. 
(f) the third column. 


In Exercises 21—26, evaluate det(.4)} by a cofactor expansion along a row or column of your choice. 


21. —3 07 
A=|2 51 
-1 0 5 
Answer: 
—40 
22. 3. 3 1 
A=|1 Oo =—4 
1-3 5 


Answer: 


k+1 k-1 7 


k+1 ik 


10 


2 


Answer: 


In Exercises 27-32, evaluate the determinant of the given matrix by inspection. 


Answer: 


Answer: 


i=) 


rer ners | 
TNT Rt NO 


- NMOS | 
rFHNOO NT OO 
-OCooO roo Oo 
Ss = 

a“ on 


32. 


33. 


34. 


35. 


36. 


37. 
38. 


39. 


40. 


41. 


Answer: 


6 
=—3 0 0 0 
1 2 0 0 
40 10 -1 0 
100 200 —23 3 
Show that the value of the following determinant is independent of 0. 
sin(@) cos(@) 0 
=—cos(@) sin(@) 0 


sin(#) —cos(@) sin(f) +cos(#) 1 


Answer: 


The determinant is sin2@ +- cos“6 =]. 


Show that the matrices 


commute if and only if 


b ac 
ed- = 
By inspection, what is the relationship between the following determinants? 
abe atiA be 
dj = d 1! and a 7= d Le 
g 901 g 901 
Answer: 
dz=a,+A 
Show that 
1 tr(.A) 1 
det(A) = > 
2ltr(A”) tr(A) 


for every 2 x 2 matrix A. 
What can you say about an nth-order determinant all of whose entries are 1? Explain your reasoning. 


What is the maximum number of zeros that a 3 x 3 matrix can have without having a zero determinant? Explain your 
reasoning. 


What is the maximum number of zeros that a 4 x 4 matrix can have without having a zero determinant? Explain your 
reasoning. 


Prove that (x1, ¥1)> (x2, ¥2)> and (x3, 3) are collinear points if and only if 
x1 vi 1 
x2 y2 1]/=0 
x3 y3 1 


Prove that the equation of the line through the distinct points (@,, 41) and (@3, 23) can be written as 


x y il 
ay b; 1/=0 
az 63 1 


42. Prove that if A is upper triangular and By is the matrix that results when the ith row and jth column of A are deleted, then 
53; is upper triangular if i <j. 


True-False Exercises 


In parts (a)—(i) determine whether the statement is true or false, and justify your answer. 


(a) The determinant of the 2 sx 2 matrix ks a is ad + be. 
c 
Answer: 
False 


(b) Two square matrices A and B can have the same determinant only if they are the same size. 
Answer: 


False 


(c) The minor Af ij is the same as the cofactor C’ ij if and only if ++ 7 is even. 
Answer: 


True 


(d) If A is a 3 5 3 symmetric matrix, then C’ y= Cc ji for all i and. 
Answer: 


True 


(e) The value of a cofactor expansion of a matrix A is independent of the row or column chosen for the expansion. 
Answer: 


True 


(f) The determinant of a lower triangular matrix is the sum of the entries along its main diagonal. 
Answer: 


False 


(g) For every square matrix A and every scalar c, we have det(cA) =c det(A). 
Answer: 


False 
(h) For all square matrices A and B, we have det(_A + 3) = det(A) + det(8). 


Answer: 


False 


(i) For every 2 x 2 matrix A, we have det(.A*) = (det(.A))?. 


Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


2.2 Evaluating Determinants by Row Reduction 


In this section we will show how to evaluate a determinant by reducing the associated matrix to row echelon form. In 
general, this method requires less computation than cofactor expansion and hence is the method of choice for large 
matrices. 


A Basic Theorem 


We begin with a fundamental theorem that will lead us to an efficient procedure for evaluating the determinant of a square 
matrix of any size. 


THEOREM 2.2.1 


Let A be a square matrix. If A has a row of zeros or a column of zeros, then det(.4} = 0. 


Proof Since the determinant of A can be found by a cofactor expansion along any row or column, we can use the row or 
column of zeros. Thus, if we let C1, C9, ..., C,, denote the cofactors of A along that row or column, then it follows from 
Formula 5 or 6 in Section 2.1 that 


det(A) =0+ C1 +0-Co+...40°C,= 


The following useful theorem relates the determinant of a matrix and the determinant of its transpose. 


THEOREM 2.2.2 


Let A be a square matrix. Then det(.A) = det(.4*). 


Because transposing a matrix changes its columns to 
rows and its rows to columns, almost every theorem 
about the rows of a determinant has a companion 
version about columns, and vice versa. 


Proof Since transposing a matrix changes its columns to rows and its rows to columns, the cofactor expansion of A 


along any row is the same as the cofactor expansion of A along the corresponding column. Thus, both have the same 
determinant. 


Elementary Row Operations 


The next theorem shows how an elementary row operation on a square matrix affects the value of its determinant. In 


place of a formal proof we have provided a table to illustrate the ideas in the 3 x 3 case (see Table 1). 


THEOREM 2.2.3 


Let A be an »y x », matrix. 


(a) If B is the matrix that results when a single row or single column of A is multiplied by a scalar k, then 
det(3) =k det(A). 


(b) If B is the matrix that results when two rows or two columns of A are interchanged, then det(3) = — det(A). 


(c) If B is the matrix that results when a multiple of one row of A is added to another row or when a multiple of 
one column is added to another column, then det(3) = det(_A). 


The first panel of Table 1 shows that you can bring a 
common factor from any row (column) of a 
determinant through the determinant sign. This is a 
slightly different way of thinking about part (a) of 
Theorem 2.2.3. 


Table 1 
Relationship Operation 
kay, kay kay, ay, ap ay The first row of A is 
4, Gy, Gy |=kl] ay, ay ay multiplied by k. 
ay, 3, 3; 437 33 


A>, >> Ay, a); Gy ay The first and second rows 
4; 4p 43 )=—l) dn, Ay ay, of A are interchanged, 
Ay, Gy G33 G43; G37 Ay 


det(B) = —det(A) 


a), +ka,, a), +ka,, a,, + kay, ay, Ay ayy A multiple of the second 
; row of A is added to the 


a> a> Ay; m=/4), Gy Gy 
as; ay 33 43, Gy ay first row. 


det(B) = det(A) 


We will verify the first equation in Table 1 and leave the other two for you. To start, note that the determinants on the two 
sides of the equation differ only in the first row, so these determinants have the same cofactors, C11, C43, C3, along that 
row (since those cofactors depend only on the entries in the second two rows). Thus, expanding the left side by cofactors 
along the first row yields 


kay, kay, kay 
aq, 492 93} = hay Cy + haygC yg + kaa 3 
431 43233 

= K(ay1Cyy +.412C 12 + 233013) 


411 412 @43 
=k)/421 422 433 
431 432 433 


Elementary Matrices 


It will be useful to consider the special case of Theorem 2.2.3 in which A = /,, is the 4 x y identity matrix and E (rather 
than B) denotes the elementary matrix that results when the row operation is performed on /,,. In this special case 
Theorem 2.2.3 implies the following result. 


THEOREM 2.2.4 


Let E be an » x » elementary matrix. 

(a) If E results from multiplying a row of /,, by a nonzero number k, then det(#) = &. 
(b) If£ results from interchanging two rows of /,,, then det(#) = — 1. 

(c) If£ results from adding a multiple of one row of /,, to another, then det(#) = 1. 


EXAMPLE 1 Determinants of Elementary Matrices << 


The following determinants of elementary matrices, which are evaluated by inspection, illustrate Theorem 
2.2.4. 


Observe that the determinant of an elementary 
matrix cannot be zero. 


oo eK 


0 
] 
0 
0 0 
The second row of 4 The firstand lastrows of 7 times the last row of 4 
was multipliedhy3. 44 were interchanged. —_ was added to the first row. 


Matrices with Proportional Rows or Columns 


If a square matrix A has two proportional rows, then a row of zeros can be introduced by adding a suitable multiple of one 


of the rows to the other. Similarly for columns. But adding a multiple of one row or column to another does not change 
the determinant, so from Theorem 2.2.1, we must have det(_A} = 0. This proves the following theorem. 


THEOREM 2.2.5 


If A is a square matrix with two proportional rows or two proportional columns, then det(4} = 0. 


EXAMPLE 2 Introducing ZeroRows 


The following computation shows how to introduce a row of zeros when there are two proportional rows. 


1 3 eo 4 ee eee The second row is 2 times the 
26 —4 8} [0 0 O O]_ 0 first, so we added —2 times 
358 1-5) (3°3 1°51 the first row to the second to 
11 48 11 48 introduce a row of zeros . 
Each of the following matrices has two proportional rows or columns; thus, each has a determinant of zero. 
3 =1 4 =-5 
ee ae 6-2 5 2 
—2 8 > 4 3 5 68 1 4 


—9 3 =12 15 


Evaluating Determinants by Row Reduction 


We will now give a method for evaluating determinants that involves substantially less computation than cofactor 
expansion. The idea of the method is to reduce the given matrix to upper triangular form by elementary row operations, 
then compute the determinant of the upper triangular matrix (an easy computation), and then relate that determinant to 
that of the original matrix. Here is an example. 


EXAMPLE 3 Using Row Reduction to Evaluate a Determinant << 


Evaluate det(.A) where 


Solution We will reduce A to row echelon form (which is upper triangular) and then apply Theorem 
21,2. 


Even with today's fastest computers it would 
take millions of years to calculate a 25 x 25 
determinant by cofactor expansion, so 


methods based on row reduction are often 
used for large determinants. For determinants 
of small size (such as those in this text), 
cofactor expansion is often a reasonable 


choice. 
det(A) = ¥ : : _ 3-6 9 _ The first and second rows of 
= > Se, Mee * Awhere interchanged . 
2 61 2 61 
1 =) 3 A common factor of 3 from 
=—3/0 1 5| < the first rowwas taken 
2 61 through the determinant sign . 
er : = : _ —2 times the first row was 
0 10 —5 - added to the third row. 
ae = : __ —10 times the second row 
0 0 —55 - was added to the third row. 
1 =—2 3 A common factor of —55 
= (—3)(—55)]0 15 «— from the last row was taken 
0.60600 (1 through the determinant sign . 


= (—3)(—55)(1) = 165 


EXAMPLE 4 Using Column Operations to Evaluate a Determinant 


Compute the determinant of 


Solution This determinant could be computed as above by using elementary row operations to reduce A to 
row echelon form, but we can put A in lower triangular form in one step by adding —3 times the first column 
to the fourth to obtain 


det(.A) = det 


0 0 
0 Of_ — 
3. gf TMG) 26) = — 346 
1 


1 
2 
0) 
7 


Example 4 points out that it is always wise to keep 
an eye open for column operations that can shorten 


computations. 


Cofactor expansion and row or column operations can sometimes be used in combination to provide an effective method 
for evaluating determinants. The following example illustrates this idea. 


EXAMPLE 5 Row Operations and Cofactor Expansion 


Evaluate det(.A) where 


aM UA 
a 
Wet H 


Solution By adding suitable multiples of the second row to the remaining rows, we obtain 
0 =1 1 3 

1 2 <1 1 

0 O 3 3 

0 8 0 


det(.A) 


«— Cofactor expansion along the first column . 


II 
I 
oS 
CoO WwW 


«— We added the first row to the third row . 


=-(-1) 


Skills 

° Know the effect of elementary row operations on the value of a determinant. 

¢ Know the determinants of the three types of elementary matrices. 

° Know how to introduce zeros into the rows or columns of a matrix to facilitate the evaluation of its determinant. 
e Use row reduction to evaluate the determinant of a matrix. 

e Use column operations to evaluate the determinant of a matrix. 


e Combine the use of row reduction and cofactor expansion to evaluate the determinant of a matrix. 


Exercise Set 2.2 


In Exercises 1-4, verify that det(.A) = det(A a, 


1 =? 3 
‘A= 


5.|1 0 0.0 
0 1 0.0 
00 =—5 0 
0 0 0 1 
Answer 
—5 

6. 100 

01 0 
—5 0 1 

7.1100 0 
001 0 
0100 
000 1 
Answer 
—1 

8.| 1 00 0 

uh 
0 3 0 0 
0 01 0 
0 00 1 

9.}1 0 0 0 
010 —9 
00 1 0 
00 0 1 
Answer: 


1 


In Exercises 10-17, evaluate the determinant of the given matrix by reducing the matrix to row echelon form. 


10.| 3 6 <9 
00 =—2 
—2 1 5 

11. 


0 
1 
3 


Answer: 


5 
12 1 =—3 
—2 4 1 
5 =—2 2 
13 3-6 9 
—2 F =—2 
0 1 5 
Answer: 
33 
14 1 —2 3 1 
5 -9 6 3 
—1 2-6 =—2 
2 8 6 1 
15.}2 13 1 
1011 
0210 
012 3 
Answer: 
6 
16. Oc 12 dr a4 
oe ug 
2 2 1 2 
at. 
3 3) 3 : 
12 
-3 3 0 60 
17. 1 3 1 3.03 
—2 —7 0 =—4 2 
0 0 1 0 1 
0 02 i aE 
0 0.0 11 
Answer: 
—2 


18. Repeat Exercises 10-13 by using a combination of row reduction and cofactor expansion. 


19. Repeat Exercises 14-17 by using a combination of row operations and cofactor expansion. 
Answer: 
Exercise 14: 39; Exercise 15: 6; Exercise 16: —* Exercise 17: —2 


In Exercises 20-27, evaluate the determinant, given that 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


YT BD ES 
Se 


72 


g h i 
atg b+h c+i 
a e F. 


g h i 
Answer: 
-6 

2a 2e af 
g+3a h+3b i+ 3c 


—3a —3b —3c¢ 
ad @ ey 
g-—4d h—-4e i-4f 


Answer: 


18 
Show that 


det) 0 az2 az3 |= — 413422431 


(b) [0 0 0 ayy 
det| © © 423 424 


= 4142234328 41 
232 433 &34 
441 @42 443 44 
29. La 
Use row reduction to show that|}@ 8 ¢ |=(b—a)(c—a)(e—4) 
a* b* ¢? 


In Exercises 30-33, confirm the identities without evaluating the determinants directly. 


30. |ay + Oye ag + bot a3 + bat a, a2 a3 
ait+b, agt+by azt+b3/=(1—27)lb1 22 23 
eC} c2 £3 €1 €2 63 


31.Jay by ay +oy+ey a, by ey 
a2 62 agtbg+ez|=|a2 42 2 
a3 63 ag+b3+e3 a3 53 063 

32. Ja, by +éay cy +rby+sa, a, a2 a3 
a2 by+faz co+rbg+saz/=|b, 22 43 
a3 53+fa3 ¢3+7b34+s5a3 C1 c2 3 


33. Ja, +2, a,—4, oc ay by cy 
ag+62 a3—63 ¢2]/= —2laz 42 ©2 
a3+43 ag—b3 63 az 63 63 


34. Find the determinant of the following matrix. 


abbdb 
baba 
bbadb 
bbba 


In Exercises 35-36, show that det(.4) = 0 without directly evaluating the determinant. 


35. =—- € 3 4 
$ 25 4 
a= 1106 5 
4 -6 4 3 
36 —4 1 1 1 
1 —4 1 1 
A=| 1 —-4 1 1 
1 —4 1 
1 1 1 #1 =4 


True-False Exercises 


In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 


(a) If A is a 4 x 4 matrix and B is obtained from A by interchanging the first two rows and then interchanging the last two 
rows, then det(8) = det(_A). 


Answer: 


True 


(b) If A is a 3 x 3 matrix and B is obtained from A by multiplying the first column by 4 and multiplying the third column 
by 3, then det(B) = 3 det(A). 


Answer: 


True 


(c) If A is a 3 x 3 matrix and B is obtained from A by adding 5 times the first row to each of the second and third rows, 
then det(.3) = 25 det(A). 


Answer: 


False 


(d) If A is an » sx » matrix and B is obtained from A by multiplying each row of A by its row number, then 


det(B) = ee det(A) 


Answer: 


False 


(e) IfA is a square matrix with two identical columns, then det(.A} = 0. 
Answer: 


True 


(f) If the sum of the second and fourth row vectors of a § x 6 matrix A is equal to the last row vector, then det(.4}) = 0. 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


2.3 Properties of Determinants; Cramer's Rule 


In this section we will develop some fundamental properties of matrices, and we will use these results to derive a 
formula for the inverse of an invertible matrix and formulas for the solutions of certain kinds of linear systems. 


Basic Properties of Determinants 


Suppose that 4 and B are », x » matrices and k is any scalar. We begin by considering possible relationships 
between det(_A), det(8), and 

det(KA), det( A+), and det(AB) 
Since a common factor of any row of a matrix can be moved through the determinant sign, and since each of the 
n rows in -4 has acommon factor of k, it follows that 


det(kA) =k" det(A) (1) 
For example, 
kayy kay kay3 411 412 433 
kaa, kan kay =kla2 an az 


Unfortunately, no simple relationship exists among det(_A), det(8), and det(_A + 8). In particular, we emphasize 
that det(_A +- 3) will usually not be equal to det(.A) +- det(3). The following example illustrates this fact. 


EXAMPLE 1 det(A+B)#det(A)+det(B) <4 


1 2 = 4 4 3 
nla sh Pali sh 4*8[3 6 
We have det(A) = 1, det(8) = 8, and detd + 8) = 23; thus 
det(A + 3) # det(A) + det(3) 


Consider 


In spite of the previous example, there is a useful relationship concerning sums of determinants that is applicable 
when the matrices involved are the same except for one row (column). For example, consider the following two 
matrices that differ only in the second row: 


a1, 12 @11 12 
A=| a | i B=|5 ta 


Calculating the determinants of A and B we obtain 


det(A) + det(S) = (a41@29 — @ 12491) + (411422 — @ 12221) 
= 411 (422 + 22) — @12(a21 + 221) 


ay} a12 
= det 
a2, +5321 222+ 422 


ae a1, @42 be det @11 12 jut a1 412 
¢ £ = 
a1 272 by, 42 “an +a a2 +oy 


This is a special case of the following general result. 


Thus 


THEOREM 2.3.1 


Let A, B, and C be » x » matrices that differ only in a single row, say the th, and assume that the 7th row 
of C can be obtained by adding corresponding entries in the rth rows of A and B. Then 


det(C’) = det(4) + det(3) 


The same result holds for columns. 


EXAMPLE 2 Sumsof Determinants << 


We leave it to you to confirm the following equality by evaluating the determinants. 


1 7 5 175 17 5 
det} 2 0 3 =det)2 0 3}+det}2 0 3 
1+0 441 7+(=1) 147 01 —1 


Determinant of a Matrix Product 


Considering the complexity of the formulas for determinants and matrix multiplication, it would seem unlikely 
that a simple relationship should exist between them. This is what makes the simplicity of our next result so 
surprising. We will show that if A and B are square matrices of the same size, then 


det(.4B) = det(A) det(3) (2) 


The proof of this theorem is fairly intricate, so we will have to develop some preliminary results first. We begin 
with the special case of 2 in which A is an elementary matrix. Because this special case is only a prelude to 2, we 
call it a lemma. 


LEMMA 2.3.2 


If B is an » x » matrix and F is an » x » elementary matrix, then 


det(ZB) = det(Z) det(B) 


Proof We will consider three cases, each in accordance with the row operation that produces the matrix E. 


Case | If£ results from multiplying a row of /,, by k, then by Theorem 1.5.1, #F results from B by multiplying 
the corresponding row by &; so from Theorem 2.2.3(a) we have 
det(#3B) =k det(S) 
But from Theorem 2.2.4(a) we have det(#) = k, so 
det( #8) = det(#) det(3) 


Case 2 and 3 The proofs of the cases where F results from interchanging two rows of /,, or from adding a 
multiple of one row to another follow the same pattern as Case | and are left as exercises. 


Remark It follows by repeated applications of Lemma 2.3.2 that if B is an » 5 2 matrix and #1, #3, ..., #, are 
» x » elementary matrices, then 


det(Z,#>..£yB) = det(#,) det(Z)...det(#,)det(3) (3) 


Determinant Test for Invertibility 


Our next theorem provides an important criterion for determining whether a matrix is invertible. It also takes us a 
step closer to establishing Formula 2. 


THEOREM 2.3.3 


A square matrix A is invertible if and only if det(A) # 0. 


Proof Let R be the reduced row echelon form of A. As a preliminary step, we will show that det(.A) and det(.3) 
are both zero or both nonzero: Let #;, #3, ..., #, be the elementary matrices that correspond to the elementary 
row operations that produce R from 4. Thus 


R=B8,+ ++ ERA 


and from 3, 


det(R) = det(Z,) + - - det(#) det(#1) det(A) (4) 


We pointed out in the margin note that accompanies Theorem 2.2.4 that the determinant of an elementary matrix 
is nonzero. Thus, it follows from Formula 4 that det(.A) and det() are either both zero or both nonzero, which 
sets the stage for the main part of the proof. If we assume first that A is invertible, then it follows from Theorem 
1.6.4 that R — j and hence that det(2) = 1 ( # 0). This, in turn, implies that det(.A) # 0, which is what we 
wanted to show. 


It follows from Theorems 2.3.3 and Theorem 
2.2.5 that a square matrix with two proportional 
rows or two proportional columns is not 
invertible. 


Conversely, assume that det(_A) # 0. It follows from this that det(2) # 0, which tells us that R cannot have a row 
of zeros. Thus, it follows from Theorem 1.4.3 that ® — j and hence that A is invertible by Theorem 1.6.4. 


EXAMPLE 3 Determinant Test for Invertibility << 


Since the first and third rows of 


are proportional, det(_4) = 0. Thus A is not invertible. 


We are now ready for the main result concerning products of matrices. 


THEOREM 2.3.4 


If A and B are square matrices of the same size, then 


det(.AB) = det(.A) det(B) 


Proof We divide the proof into two cases that depend on whether or not A is invertible. If the matrix A is not 
invertible, then by Theorem 1.6.5 neither is the product AB. Thus, from Theorem Theorem 2.3.3, we have 
det(.AS) = 0 and det(_A} = 0, so it follows that det(AB) = det( Aj} det(8). 


Augustin Louis Cauchy (1789-1857) 


Historical Note In 1815 the great French mathematician Augustin Cauchy published a landmark paper 
in which he gave the first systematic and modern treatment of determinants. It was in that paper that 
Theorem 2.3.4 was stated and proved in full generality for the first time. Special cases of the theorem had 
been stated and proved earlier, but it was Cauchy who made the final jump. 

[Image: The Granger Collection, New York] 


Now assume that A is invertible. By Theorem 1.6.4, the matrix A is expressible as a product of elementary 
matrices, say 


A=H\Ey-+ +E 


so 
AB= 8,82: + -+#,B 
Applying 3 to this equation yields 
det( AB) = det( 2, )det( #2) + - « det(#,)det(3) 
and applying 3 again yields 
det( AB) = det(#,42- + - #,jdet(S) 
which, from 5, can be written as det(4A3) = det(_A)det(). 


EXAMPLE 4 Verifying That det(AB) = det(A), det(B) <4 


3. 1 -1 3 2 17 
a=|> if: s-| 5 3 ap=|; a 
We leave it for you to verify that 


det(A)=1, det(8)= —23, and det( AB) = — 23 
Thus det(_43) = det(_A)det(#), as guaranteed by Theorem 2.3.4. 


Consider the matrices 


The following theorem gives a useful relationship between the determinant of an invertible matrix and the 
determinant of its inverse. 


(5) 


THEOREM 2.3.5 


If A is invertible, then 


det(A7!) = aoe 


Proof Since 4~!4 =], it follows that det(A—'A) = det(Z). Therefore, we must have det(.4~!)det(A) = 1. 
Since det(_A) # 0, the proof can be completed by dividing through by det(A). 


Adjoint of a Matrix 


In a cofactor expansion we compute det(_A} by multiplying the entries in a row or column by their cofactors and 
adding the resulting products. It turns out that if one multiplies the entries in any row by the corresponding 
cofactors from a different row, the sum of these products is always zero. (This result also holds for columns.) 
Although we omit the general proof, the next example illustrates the idea of the proof in a special case. 


It follows from Theorems 2.3.5 and 2.1.2 that 
det(A7y = 1...L 
@11 422 ayy 
Moreover, by using the adjoint formula it is 
possible to show that 
2 a 1 
aij 432°" ayy 
are actually the successive diagonal entries of 
A7! (compare A and 47! in Example 3 of 


Section 1.7 ). 


EXAMPLE 5 Entries and Cofactors from DifferentRows 


Let 
@i1 @12 @13 
A=|421 422 423 
431 @32 433 
Consider the quantity 
411031 12032 + 213033 
that is formed by multiplying the entries in the first row by the cofactors of the corresponding entries 


in the third row and adding the resulting products. We can show that this quantity is equal to zero by 
the following trick: Construct a new matrix 4" by replacing the third row of A with another copy of the 


first row. That is, 


@1{, @12 13 
A'=|a7, a2 ax 
@11 @12 13 
Let Ch ' C49 Cha be the cofactors of the entries in the third row of 4’. Since the first two rows of A 


and 4" are the same, and since the computations of C31, C32, C33, Ch : C49 and Cha involve only 
entries from the first two rows of A and A’, it follows that 


Cy =Cy, C=C, Cy =Chy 


Since 4" has two identical rows, it follows from 3 that 
det(A’) = 0 (6) 
On the other hand, evaluating det(.A") by cofactor expansion along the third row gives 
det(A") =ay1C4) + .a12Chg + a13C9q = 41 1C'31 + 212032 + a13C' 39 (7) 


From 6 and 7 we obtain 
ayyC3y + 29032 + 23033 = 0 


DEFINITION 1 


If A is any » s¢ x matrix and C i 1s the cofactor of 41j, then the matrix 


Ci Cia --- Ci» 
Cy, C2 ... Cay 


Cn1 Cy? --- Cm 


is called the matrix of cofactors from A. The transpose of this matrix is called the adjoint of A and is 
denoted by adj(A). 


EXAMPLE 6 Adjoint of a3 x 3 Matrix <@ 


Let 
3 2 =—1 
A=/1 6 3 
2 =—4 0 


The cofactors of A are 
Cy =12 Cy=6 Cy3= — 16 
Cy =4 Cy=2 C23 = 16 
Ca, =12 Cap= =—10 Ca3= 16 


so the matrix of cofactors is 


12 6 —16 


4 2 «16 
12 =—10 ~~ 16 
and the adjoint of A is 
12 4 = 12 
adj(A) = 6 2 =10 
—16 16 16 


Leonard Eugene Dickson (1874-1954) 


Historical Note The use of the term adjoint for the transpose of the matrix of cofactors appears to have 


been introduced by the American mathematician L. E. Dickson in a research paper that he published in 
1902. 


[Image: Courtesy of the American Mathematical Society] 


In Theorem 1.4.5 we gave a formula for the inverse of a 2 x% 2 invertible matrix. Our next theorem extends that 
result to »; s¢ % invertible matrices. 


THEOREM 2.3.6 Inverse of a Matrix Using Its Adjoint 


If A is an invertible matrix, then 


(8) 


Proof We show first that 


A adj(A) = det(Ajl 


Consider the product 


aij @j2 «+. Aly 


a2) 422 --- Gm |] Cu Ca... Cyr... Gn 

. : ; Ci2 Cy ... Cya ... Cy 

A adj(A) = _ 22 2 n2 
Qj) Giz «++ Gin ’ : : 

: : Cin Con tee Cin ose Coe 


Gn) Gnd -+- Ann 


The entry in the th row and jth column of the product A adj(A) is 
(see the shaded lines above). 


Ifi = j, then 9 is the cofactor expansion of det(_A) along the ith row of A (Theorem 2.1.1), and if? # 7, then the 
a's and the cofactors come from different rows of A, so the value of 9 is zero. Therefore, 


det(A) 0 les 0 
A adj(A) = : oe) ~ : = det( A)? (10) 
0 0 ... det(A) 


Since A is invertible, det(_A} # 0. Therefore, Equation 10 can be rewritten as 


1 ; _ 
det(A) [A adj(A}] =/ or Al ay det(A) ad(4) |=1 
Multiplying both sides on the left by 4~! yields 


—1_ 
aa D Sar ao) 


EXAMPLE 7 Using the Adjoint to Find an Inverse Matrix 
Use 8 to find the inverse of the matrix A in Example 6. 


Solution We leave it for you to check that det(_4) = 64. Thus 


jz 4 12 

Z fz 4 2p | Bo 
rey aad 84 = Ge : - i =| $4 64 64 
16 16 16 

64 64 64 


Cramer's Rule 


Our next theorem uses the formula for the inverse of an invertible matrix to produce a formula, called Cramer's 


rule, for the solution of a linear system 4x — h of 1 equations in n unknowns in the case where the coefficient 
matrix A is invertible (or, equivalently, when det(.A) # 0). 


THEOREM 2.3.7 Cramer's Rule 


If Ax —b is a system of n linear equations in n unknowns such that det(_A) # 0, then the system has a 
unique solution. This solution is 


__ det(Ay) —_ det(A) poe det(A,) 
~ det(A) ’ det(A) °°" ""  det(A) 
where A; is the matrix obtained by replacing the entries in the jth column of A by the entries in the matrix 
by 
b=|°2 
by 


Proof If det(.A) #0, then A is invertible, and by Theorem 1.6.2, x — 4~1p is the unique solution of 4x =. 
Therefore, by Theorem 2.3.6 we have 


Cry Car -.. Cyt |] ot 
x= 4b =—1 __aa(ayh = 1] C12 Cra Cra | Pa 


det(_A) det(_A) : : 
Cin Cay --- Crm |} On 
Multiplying the matrices out gives 
b1C 1, + 49C 21 +... 4+ d,Cyy 
oo bi Cy2 + 2022 +... + anCy2 
det(A) : : : 
BC yy + 22C ay, +... + ay C ry 
The entry in the jth row of x is therefore 
b1C 1; + O20 2; +... + onC yy 
r= (11) 
det(A) 
Now let 
@j1 212 --- Qyj-1 2, @tz4i --- Ay 
Apa] 421 422 + 42-1 2 G2j41 --- aan 
eo ee : ‘ ‘ : : 
@y] @y2 --- @yj-1 by &yit+1 --- ayn 


Since A; differs from A only in the jth column, it follows that the cofactors of entries b1, 2, .... By in Aj are the 


same as the cofactors of the corresponding entries in the jth column of A. The cofactor expansion of det(.A;) 
along the jth column is therefore 


Substituting this result in 11 gives 
de det(A;) 
j det(A) 


EXAMPLE 8 Using Cramer's Rule to Solve a Linear System <@ 


Use Cramer's rule to solve 


x1 + + 2x3 = 6 
—3x, + 4x2 + 6x3 = 30 
—xj, = 2x7 + 3x3 = 8 


Gabriel Cramer (1704-1752) 


Historical Note Variations of Cramer's rule were fairly well known before the Swiss 
mathematician discussed it in work he published in 1750. It was Cramer's superior notation 
that popularized the method and led mathematicians to attach his name to it. 

[Image: Granger Collection] 


Solution 
1 0 2 6 OO 2 
A=|=3 4 6|, Aj=|30 4 6], 
—1] =—2 3 8 =—2 3 
1 6 2 1 0 66 
Az=|=—3 30 6|, Az=|=—3 4 30 
—1 8 3 —1 —2 8 


For n > 3, it is usually more efficient to 
solve a linear system with n equations in n 
unknowns by Gauss—Jordan elimination 
than by Cramer's rule. Its main use is for 
obtaining properties of solutions of a 
linear system without actually solving the 
system. 


Therefore, 


det(A) 44. «+11 72” det(A) 44°—«*121° 
_ det(Az) _ 152 _ 38 


det(A4) 44.—«211 


Equivalence Theorem 


In Theorem 1.6.4 we listed five results that are equivalent to the invertibility of a matrix A. We conclude this 
section by merging Theorem 2.3.3 with that list to produce the following theorem that relates all of the major 
topics we have studied thus far. 


THEOREM 2.3.8 Equivalent Statements 


If A is an » x » matrix, then the following statements are equivalent. 
(a) A is invertible. 

(b) Ax =0 has only the trivial solution. 

(c) The reduced row echelon form of A is /,,. 

(d) A can be expressed as a product of elementary matrices. 

(e) Ax =h Is consistent for every » x | matrix b. 

(f) Ax —=hb has exactly one solution for every » 5 | matrix b. 


(g) det(A) #0. 


OPTIONAL 


We now have all of the machinery necessary to prove the following two results, which we stated without proof in 
Theorem 1.7.1: 


e Theorem 1.7.1(c) A triangular matrix is invertible if and only if its diagonal entries are all nonzero. 


e Theorem 1.7.1(d) The inverse of an invertible lower triangular matrix is lower triangular, and the inverse of an 
invertible upper triangular matrix is upper triangular. 


Proof of Theorem 1.7.1(c) Let A= [a,;] be a triangular matrix, so that its diagonal entries are 


411, 22, ---. ny 
From Theorem 2.1.2, the matrix A is invertible if and only if 
det( A) =a 11472° * ‘ayy 


is nonzero, which is true if and only if the diagonal entries are all nonzero. 


Proof of Theorem 1.7.1(d) We will prove the result for upper triangular matrices and leave the lower 
triangular case for you. Assume that A is upper triangular and invertible. Since 


At= = D — adj.) 


we can prove that ,4—! is upper triangular by showing that adj(A) is upper triangular or, equivalently, that the 
matrix of cofactors is lower triangular. We can do this by showing that every cofactor C i; with? <j (1.e., above 
the main diagonal) is zero. Since 


Cy=(-1) Vy 


it suffices to show that each minor Af i; with? <j is zero. For this purpose, let By; be the matrix that results when 
the ith row and jth column of A are deleted, so 


Mi; = det(By) (12) 


From the assumption that i < |, it follows that By; is upper triangular (see Figure Figure 1.7.1). Since A is upper 
triangular, its (7 + 1)-st row begins with at least i zeros. But the 7th row of 83; is the (i 4+- 1)-st row of A with the 
entry in the jth column removed. Since? < ,j, none of the first 7 zeros is removed by deleting the jth column; thus 
the ith row of By; starts with at least 7 zeros, which implies that this row has a zero on the main diagonal. It now 
follows from Theorem 2.1.2 that det(5;;) = 9 and from 12 that Mj; = 0. 


Concept Review 

e Determinant test for invertibility 
e Matrix of cofactors 

e Adjoint of a matrix 

e Cramer's rule 


e Equivalent statements about an invertible matrix 


Skills 


° Know how determinants behave with respect to basic arithmetic operations, as given in Equation 1, 
Theorem 2.3.1, Lemma 2.3.2, and Theorem 2.3.4. 


e Use the determinant to test a matrix for invertibility. 


* Know how det(A) and det(A~!) are related. 


e Compute the matrix of cofactors for a square matrix A. 
¢ Compute adj(_A) for a square matrix A. 
e Use the adjoint of an invertible matrix to find its inverse. 


e Use Cramer's rule to solve linear systems of equations. 


e Know the equivalent characterizations of an invertible matrix given in Theorem 2.3.8. 


Exercise Set 2.3 


In Exercises 1-4, verify that det(4A) = &"det(A). 


2 2 2 
‘A= _k= =—4 
5 | 
3. 2—1 3 
A=|3 21), 4=-2 
1 4 5 
4. 11 1 
A=|0 2 3 £=3 
01 —2 


In Exercises 5—6, verify that det(43) = det(A) and determine whether the equality 
detCA + 8) = detCA) + det(8) holds. 


5. 210 fi =1 3 

A=|3 4 0] and B=|/7 1 2 
002 § 01 

6 =] 2 al ad 

A= 0 -1| and B=|1 1 3 
—2 2 0 3-1 


In Exercises 7-14, use determinants to decide whether the given matrix is invertible. 
7. 2 35 5 
A=|-1 -1 0 

2 4 3 
Answer: 


Invertible 


8. 2. 0 3 
A= 0 3 2 
—2 0 —4 
9. 2—3 5 
A=|0 1 =—3 
0 0 2 
Answer: 
Invertible 
10. —3 0 1 
A= 5 0 6 
8 0 3 
11. 4 2 8 
A=/=-2 1 =—4 
5: a) 6 
Answer: 
Not invertible 
12. 1 0 —1 
A=|9 —1 4 
8 9 —] 
13. 


z 0 0 
A=| 8 10 
—5 3 6 


Answer: 
Invertible 
14. y2 -/7 0 
A=!3/> ~3/7 0 
5 =—93 0 


In Exercises 15—18, find the values of k for which A is invertible. 


15.,_|4-3 <2 
a=[*o, ae 


Answer: 


5+ 17 
ks 5 


16. ,_[k 2 
4-13 


ths 

I 
er Wo 
WJ et ho 

a 


ths 
I 
oF 
Ne BO 
FO 


In Exercises 19—23, decide whether the given matrix is invertible, and if so, use the adjoint method to find its 
inverse. 


19, 


Answer: 
3 —5 =—5 
At=|-3 4 5 
2—<—2 =—3 
20. 2-0" <3 
A=| 03 2 
—2 0 =—4 
vAe 2—<—3 5 
A=|0 1 =—3 
0 oO 2 
Answer: 
1. 3 
22 ! 
—1 S 
Aw=|9 15 
ng 
005 
22. 20 0 
A=| 81 0 
—5 3 6 
23. 1311 
2 ree 2 
A=! 389 
13 22 


bs o =t “O° -S 
Ae 

ae 

6 0 a7 


In Exercises 24—29, solve by Cramer's rule, where it applies. 


24, 7x, — 2x, = 3 
3x1 + x2 = 5 
25. 4x + 5y 
lix + y+ 2 
x + Sy + 2 
Answer 
2. x = 4y + 2 6 
4x = yr & = —1 
2x + 2y =— 32 —20 
27. %1 — 3x2 + *3 = 4 
2x, — x3 = =2 
4x, =— 3x3 = 0 
Answer: 
ee ee eee 
28.—%, — 4%2 + 2x3 + xq = =32 
2X, = 2 + 7x3 + 9x4 = 14 
—x; + x2 + 3x3 + x4 = 11 
xy = 2x37 + 7*3 = 4xqg = =4 
29, 3x, = x2 + x3 4 
=x; + %?x2 = 2x3 = 1 
2x1 + 6x2 = *3 = 5 
Answer: 


Cramer's rule does not apply. 


30. Show that the matrix 


cos# sin 0 
A=] -—sin@ cos# 0 
0 Oo 61 


is invertible for all values of 0; then find ,4~! using Theorem 2.3.6. 


31. Use Cramer's rule to solve for y without solving for the unknowns x, z, and w. 


3x + Fy = z+ w= 1 
Tx + 3y = S52 + 8wW = =3 
x 4 y 4 z+ 2 = Ss 
Answer: 
y=0 


32. Let Ax — h be the system in Exercise 31. 
(a) Solve by Cramer's rule. 
(b) Solve by Gauss—Jordan elimination. 
(c) Which method involves fewer computations? 
33. Prove that if det(.4) = 1 and all the entries in A are integers, then all the entries in 4~! are integers. 


34. Let Ax — h be a system of n linear equations in n unknowns with integer coefficients and integer constants. 
Prove that if det(_4} = 1, the solution x has integer entries. 


35. Let 

abe 
A= ad e fd 

gh i 

Assuming that det(.4) = — 7, find 

(a) det(3.4) 

(b) det(A~!) 

(c) det(2A71) 

(d) det((2.4) 71) 


(e) agd 
det] b hk @ 
ci f 
Answer: 
(a) —189 
b) 1 
(b) 5 
3 
(c) -3 
Oesk 
(d) 6 
(e) 7 


36. In each part, find the determinant given that A is a 4 ~% 4 matrix for which det(A) = — 2. 
(a) det( — A) 


(b) det(A~!) 
(c) det(2.47) 
(d) det(.A) 


37. In each part, find the determinant given that A is a 3 x 3 matrix for which det(A) =7. 
(a) det(3.4) 


(b) det(A~!) 
(c) det(2A71) 
(d) det((2.4)~!) 


Answer: 


(a) 189 
b) 1 
Oe 


(c) 8 
7 
q@) 1 
56 
38. Prove that a square matrix A is invertible if and only if 47 4 is invertible. 
39. Show that if A is a square matrix, then det(A PA) = det(AA =, 


True-False Exercises 


In parts (a)—(1) determine whether the statement is true or false, and justify your answer. 
(a) If A is a 3 5 3 matrix, then det(2.A) = 2 det(A). 
Answer: 


False 


(b) If A and B are square matrices of the same size such that det(.4) = det(3), then det(A + 3) = 2 det(A). 
Answer: 


False 
(c) If A and B are square matrices of the same size and A is invertible, then 
det(A!BA) = det(B) 
Answer: 


True 


(d) A square matrix A is invertible if and only if det(A) = 0. 
Answer: 


False 


(e) The matrix of cofactors of A is precisely [adj(_A) ] t 


Answer: 


True 


(f) For every » x 2 matrix A, we have 


A+ adjCA) = (detCLayl,, 
Answer: 


True 


(g) If A is a square matrix and the linear system 4x — 0 has multiple solutions for x, then det(_4) = 0. 
Answer: 


True 


(h) If A is an »z x % matrix and there exists an » x | matrix b such that the linear system 4x — h has no solutions, 
then the reduced row echelon form of A cannot be Jy. 


Answer: 


True 


(i) If E is an elementary matrix, then #y — 0 has only the trivial solution. 
Answer: 


True 


(j) If A is an invertible matrix, then the linear system 4x — () has only the trivial solution if and only if the linear 
system 4~!y — 9 has only the trivial solution. 


Answer: 


True 


(k) If A is invertible, then adj(_A} must also be invertible. 
Answer: 


True 
(1) If A has a row of zeros, then so does adj(.A). 


Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


Chapter 2 Supplementary Exercises 


In Exercises 1—8, evaluate the determinant of the given matrix by (a) cofactor expansion and (b) using 
elementary row operations to introduce zeros into the matrix. 


“iy 


Answer: 


—18 


2.| 7 =—1 
—2 =—6 
2 


5.|3 0 =1 
1d 4 
04 2 


Answer: 
—10 


6.|-5 1 
3 60 
2 


hme he 


9. Evaluate the determinants in Exercises 3—6 by using the arrow technique (see Example 7 in Section 2.1). 


Answer: 


Exercise 3: 24; Exercise 4: 0; Exercise 5: —1(); Exercise 6: —48 


10. (a) Construct a 4 x 4 matrix whose determinant is easy to compute using cofactor expansion but hard to 
evaluate using elementary row operations. 


(b) Construct a 4 x 4 matrix whose determinant is easy to compute using elementary row operations but 
hard to evaluate using cofactor expansion. 


11. Use the determinant to decide whether the matrices in Exercises 1—4 are invertible. 


Answer: 


The matrices in Exercises 1—3 are invertible, the matrix in Exercise 4 is not. 


12. Use the determinant to decide whether the matrices in Exercises 5—8 are invertible. 


In Exercises 13—15, find the determinant of the given matrix by any method. 


13.) 5 4=3 
b=—-2 —=—3 
Answer: 
—b* + 5b —21 
14.}3 =4 a 
a*~ 1 2 
2 a-—l 4 
15.10 0 O O =—3 
000600 =—4)~ OO 
00 =1 0. (0 
02 0 0 0 
50 oO O 0 
Answer: 
—120 
16. Solve for x. 
= 10 —3 
: 1_al=2 = 76 
Ue | ee ee 


In Exercises 17—24, use the adjoint method (Theorem 2.3.6) to find the inverse of the given matrix, if it 


exists. 


17. The matrix in Exercise 1. 


Answer: 


LO|Do sole 


18. The matrix in Exercise 2. 


19. The matrix in Exercise 3. 


Answer: 

ne ee 
8 8 8 
pr pare 
8 24 24 
jee eee 
4 12 12 


20. The matrix in Exercise 4. 


21. The matrix in Exercise 5. 


Answer: 
| re eh 
5 5 10 
bs oe 2 
5 5 5 
es Be yee 
5 5 10 


22. The matrix in Exercise 6. 


23. The matrix in Exercise 7. 
Answer: 


10 2 32 2? 
S22 329 329 329 


24. The matrix in Exercise 8. 


25. Use Cramer's rule to solve for x and y’ in terms of x and y. 


26. 


27. 


28. 
29. 


30. 


a 
on ae 
a ee 
y= 5% + 5 
Answer: 
(lao2.4 485 gfe oS 
RS St toy si +5 
Use Cramer's rule to solve for x’ and y" in terms of x and y. 


x =x'cos#—y’' sin 


y =x' sinf+y! cosé 
By examining the determinant of the coefficient matrix, show that the following system has a nontrivial 
solution if and only ifa = {. 
x + y+ a = 0 
x + y + & = 0 
ax + By + z= 0 


Let A be a 3 x 3 matrix, each of whose entries is 1 or 0. What is the largest possible value for det(_A)}? 


(a) For the triangle in the accompanying figure, use trigonometry to show that 
bcosy + ccos@ = @ 
ccosa + acosy = 
acoss8 + bcosa = c 


and then apply Cramer's rule to show that 
b24¢7-a? 
Zhe 


(b) Use Cramer's rule to obtain similar formulas for cos and cosy. 


cos a= 


b a 


c 


Figure Ex-29 


Answer: 
©) cos ga otal =)? cos jee ee et + b* =<" 
2ac , Zab 
Use determinants to show that for all real values of A, the only solution of 


x — 2y = X& 
x =— yp = Ay 
is x=0, y= 0. 


31. Prove: If A is invertible, then adj(_A) is invertible and 


[adj(A)] = ry py Anadis =) 


32. Prove: If A is an » x» matrix, then 
det[adj(4)] = [det(.A)]”—! 


33. Prove: If the entries in each row of an » x» matrix A add up to zero, then the determinant of A 1s zero. 
[ Hint: Consider the product 4.¥, where X is the » x | matrix, each of whose entries is one. 


34. (a) In the accompanying figure, the area of the triangle 48’ can be expressed as 


area ABC = area ADEC + area CHP SB — areadADPB 


Use this and the fact that the area of a trapezoid equals > the altitude times the sum of the parallel 


sides to show that 
x1 ¥1 1 
area ABC = 1 x3 2 «1 


2 
x3 ¥3 «1 


[Note: In the derivation of this formula, the vertices are labeled such that the triangle is traced 
counterclockwise proceeding from (x4, 1) tO (x3, 2) tO (x3, y3). For a clockwise orientation, the 
determinant above yields the negative of the area. | 


(b) Use the result in (a) to find the area of the triangle with vertices (3, 3), (4, 0), (-2, -1). 


A(x). ¥)) 


| 
| 
| 
=o 
D E 


Figure Ex-34 


35. Use the fact that 21,375, 38,798, 34,162, 40,223, and 79,154 are all divisible by 19 to show that 


2 1.3.75 
38 7 3.8 
34162 
402 2 3 
7915 4 


is divisible by 19 without directly evaluating the determinant. 
36. Without directly evaluating the determinant, show that 
sina cosa@ sinfa+d) 
sn @ cos@ sm(@+d)|/=0 


siny cosy sin(y +d) 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


| CHAPTER Bi 


Euclidean Vector Spaces 


CHAPTER CONTENTS 


3.1. Vectors in 2-Space, 3-Space, and n-Space 


3.2. Norm, Dot Product, and Distance in R” 
3.3. Orthogonality 

3.4. The Geometry of Linear Systems 

3.5. Cross Product 


INTRODUCTION 


Engineers and physicists distinguish between two types of physical quantities—scalars, 
which are quantities that can be described by a numerical value alone, and vectors, which 
are quantities that require both a number and a direction for their complete physical 
description. For example, temperature, length, and speed are scalars because they can be 
fully described by a number that tells “how much’”—a temperature of 20°C, a length of 5 
cm, or a speed of 75 km/h. In contrast, velocity and force are vectors because they require 
a number that tells “how much” and a direction that tells “which way’”—-say, a boat 
moving at 10 knots in a direction 45° northeast, or a force of 100 Ib acting vertically. 
Although the notions of vectors and scalars that we will study in this text have their 
origins in physics and engineering, we will be more concerned with using them to build 
mathematical structures and then applying those structures to such diverse fields as 
genetics, computer science, economics, telecommunications, and environmental science. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


3.1 Vectors in 2-Space, 3-Space, and n-Space 


Linear algebra is concerned with two kinds of mathematical objects, “matrices” and “vectors.” We are already 
familiar with the basic ideas about matrices, so in this section we will introduce some of the basic ideas about 
vectors. As we progress through this text we will see that vectors and matrices are closely related and that 
much of linear algebra is concerned with that relationship. 


Geometric Vectors 


Engineers and physicists represent vectors in two dimensions (also called 2-space) or in three dimensions 
(also called 3-space) by arrows. The direction of the arrowhead specifies the direction of the vector and the 
length of the arrow specifies the magnitude. Mathematicians call these geometric vectors. The tail of the 
arrow is called the initial point of the vector and the tip the terminal point (Figure 3.1.1). 


Terminal point 


Initial point 


Figure 3.1.1 


In this text we will denote vectors in boldface type such as a, b, v, w, and x, and we will denote scalars in 
lowercase italic type such as a, k, v, w, and x. When we want to indicate that a vector v has initial point A and 
terminal point B, then, as shown in Figure 3.1.2, we will write 


v= AB 
B 
y 
A 
v=AB 
Figure 3.1.2 


Vectors with the same length and direction, such as those in Figure 3.1.3, are said to be equivalent. Since we 
want a vector to be determined solely by its length and direction, equivalent vectors are regarded to be the 
same vector even though they may be in different positions. Equivalent vectors are also said to be equal, 


which we indicate by writing 
v=w 


LO) 


Equivalent vectors 


Figure 3.1.3 


The vector whose initial and terminal points coincide has length zero, so we call this the zero vector and 
denote it by 0. The zero vector has no natural direction, so we will agree that it can be assigned any direction 
that is convenient for the problem at hand. 


Vector Addition 


There are a number of important algebraic operations on vectors, all of which have their origin in laws of 
physics. 


Parallelogram Rule for Vector Addition 


If v and w are vectors in 2-space or 3-space that are positioned so their initial points coincide, then the 
two vectors form adjacent sides of a parallelogram, and the sum y 4- w is the vector represented by 
the arrow from the common initial point of y and y to the opposite vertex of the parallelogram 
(Figure 3.1.4a). 


v+w 


(a) (b) (c) 


Figure 3.1.4 


Here is another way to form the sum of two vectors. 


Triangle Rule for Vector Addition 


If y and yw are vectors in 2-space or 3-space that are positioned so the initial point of y is at the 
terminal point of y, then the sum y 4. w is represented by the arrow from the initial point of y to the 
terminal point of yw (Figure 3.1.45). 


In Figure 3.1.4c we have constructed the sums y +. w and y + y by the triangle rule. This construction makes 
it evident that 


v+Ww=Ww+v (1) 
and that the sum obtained by the triangle rule is the same as the sum obtained by the parallelogram rule. 


Vector addition can also be viewed as a process of translating points. 


Vector Addition Viewed as Translation 


If v, w, and y +. w are positioned so their initial points coincide, then the terminal point of y +. w can 
be viewed in two ways: 


1. The terminal point of y + w is the point that results when the terminal point of y is translated in 
the direction of y by a distance equal to the length of y (Figure 3.1.5a). 


2. The terminal point of y + yw is the point that results when the terminal point of y is translated in 
the direction of y by a distance equal to the length of y (Figure 3.1.55). 


Accordingly, we say that y +. w is the translation of y by yw or, alternatively, the translation of w by y. 


Figure 3.1.5 


Vector Subtraction 


In ordinary arithmetic we can write a = = @ + ( — 4), which expresses subtraction in terms of addition. 
There is an analogous idea in vector arithmetic. 


Vector Subtraction 


The negative of a vector y, denoted by ~y, is the vector that has the same length as y but is 
oppositely directed (Figure 3.1.6a), and the difference of y from yw, denoted by y — y, is taken to be 


the sum 


w—-v=w-+ (=v) (2) 


(b) (c) 


Figure 3.1.6 


The difference of y from yw can be obtained geometrically by the parallelogram method shown in Figure 
3.1.6, or more directly by positioning y and y so their initial points coincide and drawing the vector from the 
terminal point of y to the terminal point of yw (Figure 3.1.6c). 


Scalar Multiplication 


Sometimes there is a need to change the length of a vector or change its length and reverse its direction. This 
is accomplished by a type of multiplication in which vectors are multiplied by scalars. As an example, the 
product 2y denotes the vector that has the same direction as y but twice the length, and the product —?y 
denotes the vector that is oppositely directed to y and has twice the length. Here is the general result. 


Scalar Multiplication 


If y is a nonzero vector in 2-space or 3-space, and if k is a nonzero scalar, then we define the scalar 
product of y by \c to be the vector whose length is \é| times the length of y and whose direction is the 
same as that of y if k is positive and opposite to that of y if k is negative. If t — 0 or y =Q, then we 
define jy to be Q. 


Figure 3.1.7 shows the geometric relationship between a vector y and some of its scalar multiples. In 
particular, observe that ( — 1) wv has the same length as y but is oppositely directed; therefore, 


(-—l)v= -v (3) 


Figure 3.1.7 


Parallel and Collinear Vectors 


Suppose that y and yw are vectors in 2-space or 3-space with a common initial point. If one of the vectors is a 
scalar multiple of the other, then the vectors lie on a common line, so it is reasonable to say that they are 
collinear (Figure 3.1.8a). However, if we translate one of the vectors, as indicated in Figure 3.1.85, then the 
vectors are parallel but no longer collinear. This creates a linguistic problem because translating a vector does 
not change it. The only way to resolve this problem is to agree that the terms parallel and collinear mean the 
same thing when applied to vectors. Although the vector Q has no clearly defined direction, we will regard it 
to be parallel to all vectors when convenient. 


ky 


(a) (b) 


Figure 3.1.8 


Sums of Three or More Vectors 


Vector addition satisfies the associative law for addition, meaning that when we add three vectors, say u, y, 
and y, it does not matter which two we add first; that is, 

u+ (v+w)=(u+v)+w 
It follows from this that there is no ambiguity in the expression y +- y -+ w because the same result is obtained 
no matter how the vectors are grouped. 


A simple way to construct y -+- y ++ w is to place the vectors “tip to tail” in succession and then draw the 
vector from the initial point of u to the terminal point of yw (Figure 3.1.9a). The tip-to-tail method also works 
for four or more vectors (Figure 3.1.95). The tip-to-tail method also makes it evident that if u, y, and y are 
vectors in 3-space with a common initial point, then y + y +- w is the diagonal of the parallelepiped that has 
the three vectors as adjacent sides (Figure 3.1.9c). 


(a) (b) (c) 


Figure 3.1.9 


Vectors in Coordinate Systems 


Up until now we have discussed vectors without reference to a coordinate system. However, as we will soon 
see, computations with vectors are much simpler to perform if a coordinate system is present to work with. 


The component forms of the zero vector are 
0 = (0, 0) in 2-space and 0 = (0, 0, 0) in 
3-space. 


If a vector y in 2-space or 3-space is positioned with its initial point at the origin of a rectangular coordinate 
system, then the vector is completely determined by the coordinates of its terminal point (Figure 3.1.10). We 
call these coordinates the components of y relative to the coordinate system. We will write v = (v1, v2) to 
denote a vector y in 2-space with components (v1, v2), and v = (v1, v2, v3) to denote a vector y in 3-space 
with components (v1, v2, V3). 


y - 


(UV). Uy. U3) 


Figure 3.1.10 


It should be evident geometrically that two vectors in 2-space or 3-space are equivalent if and only if they 
have the same terminal point when their initial points are at the origin. Algebraically, this means that two 
vectors are equivalent if and only if their corresponding components are equal. Thus, for example, the vectors 


v= (v1, v2, v3) and w= (w1 w2, w3) 
in 3-space are equivalent if and only if 


Yy=W1, Y2=W2, V3=W3 


Remark It may have occurred to you that an ordered pair (v1, v2) can represent either a vector with 


components V4 and V2 or a point with components V1 and V3 (and similarly for ordered triples). Both are valid 
geometric interpretations, so the appropriate choice will depend on the geometric viewpoint that we want to 
emphasize (Figure 3.1.11). 


Figure 3.1.11 The ordered pair (v1, v3) can represent a point or a vector. 


Vectors Whose Initial Point Is Not at the Origin 


It is sometimes necessary to consider vectors whose initial points are not at the origin. If P, P denotes the 
vector with initial point P (x1, v1) and terminal point P5(x 3, y2), then the components of this vector are 
given by the formula 


P\P2=(x2—%1, y2—- 1) (4) 


That is, the components of PiP3 are obtained by subtracting the coordinates of the initial point from the 
coordinates of the terminal point. For example, in Figure 3.1.12 the vector PiP3 is the difference of vectors 
OP> and OP}; so 

P1P) = OP; — OP} = (x2, 2) — G1. 1) = 2-41. ¥2— 1) 
As you might expect, the components of a vector in 3-space that has initial point Pj (x1, v1, z,) and terminal 
point P5(x, v2, zz) are given by 


= 
Pi Po = (%2—%1, Y2—¥1, 22-21) (5) 


PAX Y>) 


_— -_— -—e 
v=P,P, =OP,- OP, 


Figure 3.1.12 


EXAMPLE 1 Finding the Components of a Vector 


The components of the vector y — PiP) with initial point P;(2, — 1, 4) and terminal point 
P3(7,5, —8) are 
v= (7=2, 5=(=1), (=-8) —4) = 65, 6, = 12) 


n-Space 


The idea of using ordered pairs and triples of real numbers to represent points in two-dimensional space and 
three-dimensional space was well known in the eighteenth and nineteenth centuries. By the dawn of the 
twentieth century, mathematicians and physicists were exploring the use of “higher-dimensional” spaces in 
mathematics and physics. Today, even the layman is familiar with the notion of time as a fourth dimension, an 
idea used by Albert Einstein in developing the general theory of relativity. Today, physicists working in the 
field of “string theory” commonly use 11-dimensional space in their quest for a unified theory that will 
explain how the fundamental forces of nature work. Much of the remaining work in this section is concerned 
with extending the notion of space to n-dimensions. 


To explore these ideas further, we start with some terminology and notation. The set of all real numbers can 
be viewed geometrically as a line. It is called the real line and is denoted by R or 2!. The superscript 
reinforces the intuitive idea that a line is one-dimensional. The set of all ordered pairs of real numbers (called 
2-tuples) and the set of all ordered triples of real numbers (called 3-tuples) are denoted by 2 and p°, 
respectively. The superscript reinforces the idea that the ordered pairs correspond to points in the plane 
(two-dimensional) and ordered triples to points in space (three-dimensional). The following definition extends 
this idea. 


DEFINITION 1 


If n is a positive integer, then an ordered n-tuple is a sequence of n real numbers (v1, V3, -... Vy). 
The set of all ordered n-tuples is called n-space and is denoted by 2”. 


Remark You can think of the numbers in an n-tuple (v1, v3, -.., Vy) as either the coordinates of a 
generalized point or the components of a generalized vector, depending on the geometric image you want to 
bring to mind—the choice makes no difference mathematically, since it is the algebraic properties of n-tuples 
that are of concern. 


Here are some typical applications that lead to n-tuples. 


Experimental Data A scientist performs an experiment and makes n numerical measurements each time 
the experiment is performed. The result of each experiment can be regarded as a vector 
¥ = (71.2, -..¥») in R” in which y,, y3, ..., y), are the measured values. 


Storage and Warehousing A national trucking company has 15 depots for storing and servicing its trucks. 
At each point in time the distribution of trucks in the service depots can be described by a 15-tuple 

x = (X1, X3, -.., ¥15) in which %] is the number of trucks in the first depot, *2 is the number in the second 
depot, and so forth. 


Electrical Circuits A certain kind of processing chip is designed to receive four input voltages and 
produces three output voltages in response. The input voltages can be regarded as vectors in 24 and the 
output voltages as vectors in 23. Thus, the chip can be viewed as a device that transforms an input vector 
v= (V1, V2, V3, v4) in R4 into an output vector w= (w 1, W2, W3) in R?. 

Graphical Images One way in which color images are created on computer screens is by assigning each 


pixel (an addressable point on the screen) three numbers that describe the hue, saturation, and brightness 
of the pixel. Thus, a complete color image can be viewed as a set of 5-tuples of the form y = (x, y, 2, s, b) 
in which x and y are the screen coordinates of a pixel and h, s, and 5 are its hue, saturation, and brightness. 
Economics One approach to economic analysis is to divide an economy into sectors (manufacturing, 
services, utilities, and so forth) and measure the output of each sector by a dollar value. Thus, in an 
economy with 10 sectors the economic output of the entire economy can be represented by a 10-tuple 

s = (81, $3, -.., $19) in which the numbers 51, 3, ..., S19 are the outputs of the individual sectors. 


Mechanical Systems Suppose that six particles move along the same coordinate line so that at time ¢ their 
coordinates are x1, X32, ..., Xs and their velocities are vj, v3, ..., ¥g, respectively. This information can be 
represented by the vector 


v= (x1, X2, X3, X4, X5, X6, V1, V2, V3, V4, VS, V6, £) 


in R!3. This vector is called the state of the particle system at time t. 


Albert Einstein (1879-1955) 


Historical Note The German-born physicist Albert Einstein immigrated to the United States in 
1935, where he settled at Princeton University. Einstein spent the last three decades of his life 
working unsuccessfully at producing a unified field theory that would establish an underlying link 
between the forces of gravity and electromagnetism. Recently, physicists have made progress on the 
problem using a framework known as string theory. In this theory the smallest, indivisible 
components of the Universe are not particles but loops that behave like vibrating strings. Whereas 


Einstein's space-time universe was four-dimensional, strings reside in an 11-dimensional world that is 
the focus of current research. 
[Image: © Bettmann/© Corbis| 


Operations on Vectors in R" 


Our next goal is to define useful operations on vectors in R”. These operations will all be natural extensions 
of the familiar operations on vectors in R2 and R3. We will denote a vector y in 2” using the notation 


v= (v1, V2s --- Vy) 


and we will call 0 = (0, 0, ..., 0) the zero vector. 


We noted earlier that in 2 and R? two vectors are equivalent (equal) if and only if their corresponding 


components are the same. Thus, we make the following definition. 


DEFINITION 2 


Vectors v = (V1, V3, -... Vy) and w= (wy, w9, -... Wy) in R” are said to be equivalent (also called 
equal) if 
Vj=W1, V2=W,... Vyr=Wy 


We indicate this by writing ¥ = w. 


EXAMPLE 2 Equality of Vectors <4 


(a, 6,¢c,2@)=(1, —4, 2,7) 
if and only if@ =1,/= —4,c=2,and 7 —7. 


Our next objective is to define the operations of addition, subtraction, and scalar multiplication for vectors in 
R”. To motivate these ideas, we will consider how these operations can be performed on vectors in 22 using 


components. By studying Figure 3.1.13 you should be able to deduce that if v= (v4, v2) and w= (w 1, w2), 


then 


v+w= (v1 +1, V2+ Ww) 


kw = (ky, kv) 


(6) 


(7) 


In particular, it follows from 7 that 


and hence that 


—v=(-l)v=(-v1, —v2) 


w—-v=we (—v) = (Ww) —v1, W2—V3) 


(VU, +), Uz + W) 


Figure 3.1.13 


Motivated by Formulas 6—9, we make the following definition. 


DEFINITION 3 


If v= (v4, v3, -.., Vy) and w= (wy, w9, -... Wy) are vectors in R”, and if k is any scalar, then we 


define 


v+w= (vj +1, V2 + 02, --¥y + Wy) 


kw = (kv, kv, -.4vy) 


“v= (v1, —V2, ---= Vy) 


w—v=w-+ (-v) = (W1 —v1, W2— V2, --Wy— Vy) 


(10) 


(11) 


(12) 


(13) 


(8) 


(9) 


In words, vectors are added (or subtracted) by 
adding (or subtracting) their corresponding 
components, and a vector is multiplied by a 
scalar by multiplying each component by that 
scalar. 


EXAMPLE 3 Algebraic Operations Using Components 


Ifv= (1, —3, 2) and w= (4, 2, 1), then 
v+w=(5, -—1,3), 2v=(2, —6,4) 
ew= (—4, =2=-1) vew=v-+ (=—w) = (-3, —5, 1) 


The following theorem summarizes the most important properties of vector operations. 


THEOREM 3.1.1 


Ifu, v, and ware vectors in 2”, and if k and mare scalars, then: 
(a) UAV=V+U 

(b) U+v) +w=u-+ (v+w) 

(c) ut+90=0+u=u 

(d) ut (=u) =0 

(ec) Ku+v) =ku+kv 

() (e+ m)u=kut+ mu 

(g) kQeu) = (km)u 


(h) lu=u 


We will prove part (5) and leave some of the other proofs as exercises. 
Proof (b) Letu= (#1, #3, ...,#y,), V= (V1, V2, ..., Vy), and W= (wy, W3, .... Wy). Then 


(utv)+w = ((u1, 42, ....¥n) + (V1, V2, --. ¥n)) + OWL, Wa, --. Wr) 
= (Uy #V1,U2 + V2, -... dy bVy) + (1, W2, -... Wy) [ Vector addition ] 
= ((y +V1) + wy, (22 +72) +, ..., &y + Vy) + Wy) [Vector addition] 
= (uy + (V1 +1), 42 + (V2 +2), --. n+ Yn +wy)) [Regroup] 
= (&1, U2, ....Uy) + (V1, + WI, V2 + 2, --.. Py F Wy) [ Vector addition | 
=u+ (v+w) 


The following additional properties of vectors in 2” can be deduced easily by expressing the vectors in terms 


of components (verify). 


THEOREM 3.1.2 


If v is a vector in 8” and k is a scalar, then: 


(a) 9¥=0 
on== 
(c) (=-l)v= -v 


Calculating Without Components 


One of the powerful consequences of Theorems 3.1.1 and 3.1.2 is that they allow calculations to be performed 
without expressing the vectors in terms of components. For example, suppose that x, a, and b are vectors in 
R”, and we want to solve the vector equation x +- a = h for the vector x without using components. We could 


proceed as follows: 


x+a=b [Given] 
(x+-a)+(—-a)—=b+(—a) Add the negative of ato both sides 
x+(a+(—-a))=b—a Part (4) of Theorem 3.1.1 
x+0=b—a Part (d) of Theorem 3.1.1 
x=b—a Part (c) of Theorem 3.1.1 


While this method is obviously more cumbersome than computing with components in 2”, it will become 
important later in the text where we will encounter more general kinds of vectors. 


Linear Combinations 


Addition, subtraction, and scalar multiplication are frequently used in combination to form new vectors. For 
example, if V1, ¥2, and V3 are vectors in ®”, then the vectors 
u=2v,; + 3v2+ v3 and w= Vv; — 6v2+ 8v3 


are formed in this way. In general, we make the following definition. 


DEFINITION 4 


If w is a vector in 8”, then yw is said to be a linear combination of the vectors v1, ¥2, ..., Vy in R” if it 


can be expressed in the form 
w= kyvy + kov2 +... + k,v, (14) 


where 1, 3, ..., &y are scalars. These scalars are called the coefficients of the linear combination. In 
the case where » — 1, Formula 14 becomes w = £;¥j, so that a linear combination of a single vector 
is just a scalar muliple of that vector. 


Note that this definition of a linear combination 
is consistent with that given in the context of 
matrices (see Definition 6 in Section 1.3). 


Application of Linear Combinations to Color Models 


Colors on computer monitors are commonly based on what is called the RGB color model. Colors in 
this system are created by adding together percentages of the primary colors red (R), green (G), and 
blue (B). One way to do this is to identify the primary colors with the vectors 


r=(1,0,0) (pure red), 
g@=(0,1,0) (pure green), 
b= (0,0,1) (pure blue) 
in R? and to create all other colors by forming linear combinations of r, g, and b using coefficients 
between 0 and 1, inclusive; these coefficients represent the percentage of each pure color in the mix. 
The set of all such color vectors is called RGB space or the RGB color cube (Figure 3.1.14). Thus, 
each color vector c in this cube is expressible as a linear combination of the form 
ce =kr+kog+k3b 
=k,(1, 0, 0) + 42(0, 1, 0) + 43(0, 0, 1) 
= (k1, k2, &3) 
where 0) < 4; < 1. As indicated in the figure, the corners of the cube represent the pure primary colors 


together with the colors black, white, magenta, cyan, and yellow. The vectors along the diagonal 
running from black to white correspond to shades of gray. 


Blue Cyan 


(1, 0, 1) 


Black 


Red ¢ 
(1, 0, 0) (1, 1, 0) 


Figure 3.1.14 


Alternative Notations for Vectors 
Up to now we have been writing vectors in 2” using the notation 

V= (71, V2, --» Vn) (15) 
We call this the comma-delimited form. However, since a vector in ®” is just a list of its m components in a 


specific order, any notation that displays those components in the correct order is a valid way of representing 
the vector. For example, the vector in 15 can be written as 


v= [v1 v2... Vy] (16) 
which is called row-matrix form, or as 
v1 
v2 
v=|” (17) 
Vn 


which is called column-matrix form. The choice of notation is often a matter of taste or convenience, but 
sometimes the nature of a problem will suggest a preferred notation. Notations 15, 16, and 17 will all be used 
at various places in this text. 


Concept Review 
e Geometric vector 
e Direction 

° Length 

° Initial point 

° Terminal point 


e Equivalent vectors 


Zero vector 

e Vector addition: parallelogram rule and triangle rule 
e Vector subtraction 

° Negative of a vector 

e Scalar multiplication 

° Collinear (1.e., parallel) vectors 

¢ Components of a vector 

° Coordinates of a point 

e n-tuple 


* n-space 


Vector operations in n-space: addition, subtraction, scalar multiplication 


e Linear combination of vectors 


Skills 

e Perform geometric operations on vectors: addition, subtraction, and scalar multiplication. 
e Perform algebraic operations on vectors: addition, subtraction, and scalar multiplication. 
e Determine whether two vectors are equivalent. 

e Determine whether two vectors are collinear. 

e Sketch vectors whose initial and terminal points are given. 

e Find components of a vector whose initial and terminal points are given. 


e Prove basic algebraic properties of vectors (Theorems 3.1.1 and 3.1.2). 


Exercise Set 3.1 


In Exercises 1—2, draw a coordinate system (as in Figure 3.1.10) and locate the points whose coordinates are 
given. 


h(a) (3, 4,5) 
(b) (-3, 4, 5) 
(c) (3, -4, 5) 
(d) (3, 4, -5) 
(e) (-3, -4, 5) 
(f) (-3,4,-5) 


Answer: 


(a) 


(b) 


(c) 


(d) 


(f) 


2. (a) (0,3,-3) 
(b) (3,-3,0) 
(c) (-3, 0, 0) 
(d) (3, 0, 3) 
(e) (0, 0, -3) 
(f) (0, 3, 0) 


In Exercises 3-4, sketch the following vectors with the initial points located at the origin. 
3. (a) v1 = (3, 6) 


(b) ¥2=(—-4, —8) 
(c) ¥3=(—4, —3) 


(d) v4= (3,4, 5) 
(e) ¥5 = (3, 3, 0) 
(f) ¥6=(— 1,0, 2) 


Answer: 


(a) 


(b) 


(c) 


(d) 


(e) 


(f) 


4. (a) v1 = (5, -—4) 
(b) ¥2 = (3, 9) 
(c) ¥3= (0, —7) 
(d) ¥4= (0, 0, — 3) 
(e) ¥5= (0,4, —1) 


() v6= (2, 2, 2) 
In Exercises 5—6, sketch the following vectors with the initial points located at the origin. 


5. (a) P1(4,8), P2(3,7) 
(b) P13, -5), Pa(-4, -7) 
(c) P13, -7,2),  Po(—2,5, —4) 


Answer: 


(a) 


(b) F 


(c) 


6. (a) Pi(— 5,0), P2(—3, 1) 
(b) ?1(0,0), 2(3, 4) 
(c) Pi€—1, 0,2), P2(0, — 1,0) 
(d) P1(2, 2,2), P2(0, 0, 0) 


In Exercises 7-8, find the components of the vector P, P5. 


Tea) P1(3,5),  P2(2, 8) 
(b) P15, 2,1), P2(2,4, 2) 


Answer: 


(@) PP = (-1,3) 
(b) PyP = (-3, 6, 1) 

8. (a) Pi(—6,2), P2(—4, = 1) 
(b) P1(0, 0,0), P2(=1,6, 1) 


9. (a) Find the terminal point of the vector that is equivalent to u = (1, 2) and whose initial point is A{1, 1) 


(b) Find the initial point of the vector that is equivalent tou = (1, 1, 3) and whose terminal point is 
B(=1, = 1, 2). 


Answer: 


(a) The terminal point is B(2, 3). 
(b) The initial point is A¢—2, —2, — 1). 


10. (a) Find the initial point of the vector that is equivalent to u = (1, 2) and whose terminal point is 8(2, 0) 
(b) Find the terminal point of the vector that is equivalent to u= (1, 1, 3) and whose initial point is 
ACO, 2, 0). 


11. Find a nonzero vector u with terminal point (3, 0, — 5) such that 
(a) uhas the same direction as ¥= (4, —2, = 1). 


(b) uis oppositely directed tov = (4, —2, = 1). 


Answer: 
(a) u= (=1, 2, —4) is one possible answer. 
(b) u= (7, —2, —6) is one possible answer. 


12. Find a nonzero vector u with initial point P( — 1, 3, —5) such that 
(a) u has the same direction as v= (6, 7, — 3). 


(b) u is oppositely directed to v= (6, 7, — 3). 


13. Letu= (4, = 1), v= (0, 5), andw= ( — 3, — 3). Find the components of 


(a) wtw 
(b) ¥—3u 
(c) 2(u— Sw) 


(d) 3v— 2(u+ 2w) 
(ce) —30w— 2u+ v) 
(f) (—2u—v) —5(w + 3w) 


Answer: 


(a) utw=(1, -4) 

(b) ¥— 3u= (=12, 8) 

(c) 2(u— Sw) = (38, 28) 

(d) 3¥—2(u+ 2w) = @, 29) 

(ce) —3Qw— 2u+v) = (33, — 12) 

(f) (—2u—v) —5(v + 3w) = (37, 17) 


14. Lettu= (= 3, 1, 2), v= (4, 0, — 8), andw= (6, — 1, —4). Find the components of 


15. 


16. 
17. 


(a) v-wWw 

(b) 6u+ 2v 

(ce) =F +o 

(d) 3(v —4u) 

(ce) —3(¥ — Bw) 

(f) (2u— Fw) — (8v+u) 


Letu= (=—3, 2,1,0), v= (4,7, —3, 2), andw= (5, —2, 8, 1). Find the components of 
(a) ¥-W 

(b) 2u-++ iv 

(c) —u+ (v—4w) 

(d) 6(u— 3v) 

(ec) -¥-Ww 

(f) (6v—w) — (4u+ v) 


Answer: 


(a) (-1,9, =11,1) 
(b) (22, 53, — 19, 14) 

() (-13, 13, —36, —2) 
(a) (—90, — 114,60, — 36) 
() (-9, -5, -5, —3) 
(f) (27,29, —27,9) 


Let u, v, and w be the vectors in Exercise 15. Find the vector x that satisfies 5x — 2v = 2(w — 5x). 
Letu= (5, =—1,0,3, =—3),v=(=—1, —1,7, 2,0), andw=(—4, 2, —3, —5, 2). Find the 
components of 

(a) Wu 

(b) 2¥-+ 3u 

(c) —W+ 3(v—u) 

(d) 3(—v+4u—w) 

(ce) —2(3w+ v) + (2u+w) 


(f) Sw 5¥ + 2u) + ¥ 


Answer: 


(a) wou=(=9,3, —3, —8, 5) 

(b) 2v-+ 3u= (13, —5, 14, 13, —9) 

(c) —w+ 3(v—u) = (—14, — 2, 24, 2,7) 

(d) 3(—v + 4u—w) = (125, — 25, — 20,75, — 70) 


1 


v<) 


19. 


20. 


2 


_ 


(ce) —2(3w+ v) + (2u+w) = (32, — 10, 1, 27, — 16) 


(cae one bt Ye go gs ae 
1 (Ww Sw 2u) +¥ (> 3-12, -2, -2 


Letu= (1,2, —3,5,0), v=(0,4, —1, 1,2), andw=(7, 1, —4, —2, 3). Find the components of 


(a) ¥+Ww 

(b) 3(2u—v) 

(c) (3u—v) — (2u+ 4w) 

Letu= (=—3, 1, 2,4,4), v= (4,0, —8, 1,2), andw= (6, —1, —4, 3, —5). Find the components 
of 

(a) ¥-W 

(b) 6u+ 2v 

(c) (2u— Aw) — (8v + u) 


Answer: 

(a) V-w=(-2,1, -—4, —2,7) 

(b) 6u + 2v = (=—10, 6, —4, 26, 28) 

(c) (2u = Fw) — (8v + u) = (=—77, 8, 94, — 25, 23) 


Let u, v, and w be the vectors in Exercise 18. Find the components of the vector x that satisfies the 
equation 3y ++ vy — 2w = 3x + 2w. 


. Let u, v, and w be the vectors in Exercise 19. Find the components of the vector x that satisfies the 


equation 2u =v +x = 7x-+w. 


Answer: 
ee aa ee ee 
a ee a eG 
22. For what value(s) of ¢, if any, is the given vector parallel tou = (4, = 1}? 
(a) (8%, —2) 
(b) (8é, 2£) 
© (1, 0” | 
23. Which of the following vectors in 2° are parallel tou= ( — 2, 1, 0, 3, 5, 1)? 


(a) (4, 2,0, 6, 10, 2) 
(b) (4, — 2,0, —6, — 10, —2) 
(c) (0,0, 0, 0, 0, 0) 


Answer: 


(a) Not parallel 
(b) Parallel 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


(c) Parallel 


Letu= (2, 1,0, 1, = 1) andv=( = 2, 3, 1, 0,2). Find scalars a and b so that 
au-+-dv= (=—8, 8, 3, = 1,7). 


Letu= (1, —1, 3,5) and v= (2, 1,0, —3). Find scalars a and b so that au+ v= (1, —4, 9, 18). 
Answer: 


a=3,b=>-1 
Find all scalars ¢1, ¢2, and ¢3 such that 


c1(1, 2, 0) +¢9(2, 1, 1) +¢3(0, 3, 1) = (0, 0, 0) 


Find all scalars ¢1, ¢2, and ¢3 such that 
e1(1, — 1,0) +¢9(3, 2, 1) +¢3(0, 1,4) = (=1, 1, 19) 


Answer: 
ey =2, c2= —1, c3=5 
Find all scalars ¢1, ¢2, and ¢3 such that 
e7€—1, 0,2) +e9(2, 2, =—2) +e3(1, —2, 1) = (= 6, 12, 4) 
Let uy; = (= 1, 3, 2, 0), ug = (2, 0,4, = 1),u3 = (7, 1, 1, 4), and ug = (6, 3, 1, 2). Find scalars ¢1, 
C3, ¢3, and 4 such that cyuy + cquz + ¢3u3 + ¢qug= (0, 5,6, — 3). 
Answer: 


cp=1, cg=1, cg=—1, cg=1 
Show that there do not exist scalars ¢1, ¢3, and ¢3 such that 
e1€1, 0, 1,0) +e¢9€1, 0, — 2, 1) +¢3(2, 0, 1,2) = (1, =—2, 2, 3) 


Show that there do not exist scalars ¢1, ¢2, and ¢3 such that 
cy(—2, 9,6) +e3( = 3, 2, 1) +6301, 7, 5) = (0, 5, 4) 
Consider Figure 3.1.12. Discuss a geometric interpretation of the vector 
“= OP; + 3 (OP2— OP 1} 
Let P be the point (2, 3, — 2) and Q the point (7, —4, 1). 
(a) Find the midpoint of the line segment connecting P and Q. 


(b) Find the point on the line segment connecting P and Q that is 3 of the way from P to Q. 


Answer: 


34. Let P be the point (1, 3, 7). Ifthe point (4, 0, — 6) is the midpoint of the line segment connecting P and 
QO, what is O? 


35. Prove parts (a), (c), and (d) of Theorem 3.1.1. 
36. Prove parts (e)-(h) of Theorem 3.1.1. 
37. Prove parts (a)-(c) of Theorem 3.1.2. 


True-False Exercises 
In parts (a)—-(k) determine whether the statement is true or false, and justify your answer. 
(a) Two equivalent vectors must have the same initial point. 

Answer: 


False 


(b) The vectors (a, ») and (a, », 0) are equivalent. 
Answer: 


False 


(c) Ifk is a scalar and v is a vector, then v and kv are parallel if and only if k > 0. 
Answer: 


False 


(d) The vectors v ++ (u + w) and (w+ v) + ware the same. 
Answer: 


True 


(e) Ify b-yY>Su+W then ¥ = w. 
Answer: 


True 


(f) If a and b are scalars such that gy + Sv = 0, then u and v are parallel vectors. 
Answer: 


False 


(g) Collinear vectors with the same length are equal. 
Answer: 


False 


(h) If (a, b,c) + (x, y, z) = (x, y, z), then (a, 4, c) must be the zero vector. 


Answer: 


True 


(i) If k and m are scalars and u and v are vectors, then 


Ck +e 2) Ca v) = ku mv 


Answer: 


False 


(j) If the vectors v and w are given, then the vector equation 
3(2v—x) = 5x-—4w+v 


can be solved for x. 
Answer: 


True 
(k) The linear combinations @1¥1 ++ @3¥3 and bw, + 43v3 can only be equal if aj = 4; andaz = 9. 


Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


3.2 Norm, Dot Product, and Distance in R” 


In this section we will be concerned with the notions of length and distance as they relate to vectors. We will 
first discuss these ideas in 22 and 2? and then extend them algebraically to 2”. 


Norm of a Vector 


In this text we will denote the length of a vector v by the symbol ||¥||, which is read as the norm of v, the 
length of v, or the magnitude of v (the term “norm” being a common mathematical synonym for length). As 
suggested in Figure 3.2. 1a, it follows from the Theorem of Pythagoras that the norm of a vector (v1, v2) in R? 


1S 


IIvll = pv? +4 (1) 


Similarly, for a vector (v1, v2, v3) in 3, it follows from Figure 3.2.1b and two applications of the Theorem of 
Pythagoras that 


lvl? = (OR)? + (RP)? = (09)? + (QR)? + (RP)? =v? + v2 4 v2 


and hence that 


IIvll = fv? + v3 + v4 (2) 


Motivated by the pattern of Formulas | and 2 we make the following definition. 


DEFINITION 1 


If v= (v4, v2, -.., Vy) is a vector in R”, then the norm of v (also called the length of v or the 
magnitude of v) is denoted by ||¥||, and is defined by the formula 


vil = v2 + v2 42 4... + v2 (3) 


EXAMPLE 1 Calculating Norms << 


It follows from Formula 2 that the norm of the vector v = ( — 3, 2, 1) in 2? is 


lvl =f (—3)? 4 22 4 12 = 14 


and it follows from Formula 3 that the norm of the vector y= (2, — 1, 3, —5) in R4 is 


lvl] = y'2? + (—1)? 4. 3? 4 (—5)2 = 39 


(b) 


Figure 3.2.1 


Our first theorem in this section will generalize to 2” the following three familiar facts about vectors in R2 and 
R?: 

e Distances are nonnegative. 

e The zero vector is the only vector of length zero. 

e Multiplying a vector by a scalar multiplies its length by the absolute value of that scalar. 

It is important to recognize that just because these results hold in R2 and 2 does not guarantee that they hold 


in R"— their validity in R” must be proved using algebraic properties of n-tuples. 


THEOREM 3.2.1 


If visa vector in”, and if kis an scalar, then: 
y 


(a) \\v|| 20 
(b) |\|¥|| = 0 if and only if y =0 
(c) lll = |eIll¥ll 


We will prove part (c) and leave (a) and (6) as exercises. 


Proof (c) Ifv= (v1, v3, ..-. Vy), then kv = (kv, kv, ..., kv,,), so 


evil = (vy)? + (evn)? + + + Cevy,)? 
= (7) (v7 +vepe ee 4 vi | 
= beyveaveg + 4ue 
= [llivll 


Unit Vectors 


A vector of norm | is called a unit vector. Such vectors are useful for specifying a direction when length is not 
relevant to the problem at hand. You can obtain a unit vector in a desired direction by choosing any nonzero 
vector v in that direction and multiplying v by the reciprocal of its length. For example, if v is a vector of 


length 2 in p2 or R3, then >v is a unit vector in the same direction as v. More generally, if v is any nonzero 


vector in ®”, then 
a= 7¥ (4) 
IIvll 
defines a unit vector that is in the same direction as v. We can confirm that 4 is a unit vector by applying part 


(c) of Theorem 3.2.1 with k = 1 / ||w|] to obtain 


Ilall = lev] = [AUII¥ll = €ll¥ll = lvl] =1 


_l | 
II¥ll 
The process of multiplying a nonzero vector by the reciprocal of its length to obtain a unit vector is called 
normalizing v. 


WARNING 


Sometimes you will see Formula 4 expressed as 


¥ 
u= 


Ill 


This is just a more compact way of writing that 
formula and is not intended to convey that v is 
being divided by ||¥||. 


EXAMPLE 2 Normalizinga Vector << 


Find the unit vector u that has the same direction as v= (2, 2, = 1). 


Solution The vector v has length 


lvl] = 2? 4 2? 4 (1)? =3 


Thus, from 4 
si cl aiyeat= 2. at 
age = (3.4. 3) 


As a check, you may want to confirm that |u|] = 1. 


The Standard Unit Vectors 


When a rectangular coordinate system is introduced in 2? or 2, the unit vectors in the positive directions of 
the coordinate axes are called the standard unit vectors. In R2 these vectors are denoted by 


i=(1,0) and j=(0, 1) 
and in R? by 
i=(1,0,0), j=(0,1,0), and k=(0,0, 1) 
(Figure 3.2.2). Every vector v = (v1, vz) in R2 and every vector v = (v1, v3, v3) in R? can be expressed as a 


linear combination of standard unit vectors by writing 


v= (v1, 72) =v1 (1, 9) + ¥2(0, 1) =vyi4+ vj (5) 


v= (1, v2, v3) = v1 (1, 0, 0) + 900, 1, 0) + ¥3(0, 0, 1) = vyit+ voj+ v3k (6) 
Moreover, we can generalize these formulas to ®” by defining the standard unit vectors in R™ to be 
e, =(1,0,0,..,0), e2g= (0, 1,0,...,0),... e,=(0,0,0,... 1) (7) 
in which case every vector v = (V4, V3, ..., Vy) in R” can be expressed as 


V= (V1, V2, -... Vn) =V1e1 +¥2e2 +... + Ven (8) 


EXAMPLE 3 Linear Combinations of Standard Unit Vectors << 


(2, = 3,4) = 2i— 33+ 4k 
(7,3, —4, 5) = 7e, + 3e3 —4e3 + Seq 


i (0.1, 0) 
«1, 0,0) 


(b) 


Figure 3.2.2 


Distance in R" 


If P; and P are points in R? or R2, then the length of the vector P1P3 1s equal to the distance d between the 
two points (Figure 3.2.3). Specifically, if P; (x1, v1) and Py(x9, y3) are points in R2, then Formula 4 of 
Section 3.1 implies that 


d= ||P1Pall =y (xy — 21)? + 2-1)? (9) 


This is the familiar distance formula from analytic geometry. Similarly, the distance between the points 
P1(x1, ¥1,21) and Pa(x2, yz, zz) in 3-space is 


d(u, v) =||P1Poll = y (xg — x1)? + (rg — 1)? + 2-21)" (10) 


Motivated by Formulas 9 and 10, we make the following definition. 


DEFINITION 2 


Ifu= (uj, u2, -... ¥,) and v= (v4, v3, -... Vy) are points in 8”, then we denote the distance between 
u and v by d(u, v) and define it to be 


d(u, v) =|ju—vil = (ay 1)? + (ug —¥9)? + «+ + (yp — Yn) (11) 


P, 


P, 
d=||P,P,| 


Figure 3.2.3 


We noted in the previous section that n-tuples 
can be viewed either as vectors or points in 2”. 
In Definition 2 we chose to describe them as 
points, as that seemed the more natural 
interpretation. 


EXAMPLE 4 Calculating Distance in R” “ 


If 
u=(1,3, —2,7) and v=(0,7, 2,2) 


then the distance between u and v is 


d(u, v) = (1-0)? + (3-7)? + (-2—2)2 4 (7-2)2 = 58 


Dot Product 


Our next objective is to define a useful multiplication operation on vectors in 22 and @? and then extend that 
operation to 2”. To do this we will first need to define exactly what we mean by the “angle” between two 
vectors in 22 or R2. For this purpose, let u and v be nonzero vectors in 22 or 2° that have been positioned so 
that their initial points coincide. We define the angle between u and v to be the angle 0 determined by u and v 
that satisfies the inequalities 0 <  < m (Figure 3.2.4). 


DEFINITION 3 


If u and y are nonzero vectors in 22 or 23, and if @ is the angle between u and vy, then the dot product 


(also called the Euclidean inner product) of u and v is denoted by y - y and is defined as 
u-v=|lull||¥||cos é (12) 


If y=Q or y =Q, then we define y - y to be 0. 


The angle @ between u and v satisfies O< 0 < 7. 


Figure 3.2.4 


The sign of the dot product reveals information about the angle 8 that we can obtain by rewriting Formula 12 
as 


alli (13) 


Since 0 < @ <q, it follows from Formula 13 and properties of the cosine function studied in trigonometry that 
° @is acute ify -y > 0. 

° @is obtuse ify-y <0. 

° @=a/2ifu-v=0. 


EXAMPLE 5 DotProduct 


Find the dot product of the vectors shown in Figure 3.2.5. 


Figure 3.2.5 


Solution The lengths of the vectors are 


jul] =1 and [jv] =¥3=292 


and the cosine of the angle 8 between them is 
cos (45°) =1/ y2 


Thus, it follows from Formula 12 that 


u- v= |lull|lvllcos = (1) (2¥2}(1 j 42) =2 


EXAMPLE 6 AGeometry Problem Solved Using Dot Product 


Find the angle between a diagonal of a cube and one of its edges. 


Solution Let k be the length of an edge and introduce a coordinate system as shown in Figure 3.2.6. 
If we let uy = (&, 0, 0), ug = (0, &, 0}, and uz = (0, 0, &), then the vector 

d= (4,4, 4) =uy + up + 03 
is a diagonal of the cube. It follows from Formula 13 that the angle 8 between d and the edge uy 
satisfies 


uj:d k? 1 


cos@#@= = 


Faille ey (fae?) V3 


With the help of a calculator we obtain 


(0, k, 0) 


(k, 0, 0) 


Figure 3.2.6 


Note that the angle @ obtained in Example 6 
does not involve k. Why was this to be 
expected? 


Component Form of the Dot Product 


For computational purposes it is desirable to have a formula that expresses the dot product of two vectors in 
terms of components. We will derive such a formula for vectors in 3-space; the derivation for vectors in 


2-space is similar. 


Let u = (21, #3, #3) and v= (v1, v2, v3) be two nonzero vectors. If, as shown in Figure 3.2.7, 6 is the angle 
between u and v, then the law of cosines yields 


— 2 
POI = lull? + [lvl]? — 2|lull||¥l|cos 8 (14) 


Josiah Willard Gibbs (1839-1903) 


Historical Note The dot product notation was first introduced by the American physicist and 
mathematician J. Willard Gibbs in a pamphlet distributed to his students at Yale University in the 
1880s. The product was originally written on the baseline, rather than centered as today, and was 
referred to as the direct product. Gibbs's pamphlet was eventually incorporated into a book entitled 
Vector Analysis that was published in 1901 and coauthored with one of his students. Gibbs made major 
contributions to the fields of thermodynamics and electromagnetic theory and is generally regarded as 
the greatest American physicist of the nineteenth century. 

[Image: The Granger Collection, New York] 


. —_ . 
Since PO = vy —u, we can rewrite 14 as 


full vlicos 8 = > {Ihall? + IvlI? — lly — ull?} 
or 
wv = 5 (lull? + [lvl — lly —ull?} 
Substituting 
jul]? =u? +42 +42, |Ivi|? =v? + v2 +-v2 
and 


2 2 3 2 
lv — ull“ = (vy — 41)" + (V2 —42)* + (3 — 3) 


we obtain, after simplifying, 


U-VS= RV, FUQV2 + U3V3 (15) 


Although we derived Formula 15 and its 
2-space companion under the assumption that u 
and v are nonzero, it turned out that these 
formulas are also applicable if y—Q ory =O 
(verify). 


The companion formula for vectors in 2-space is 
U-VShjVvy uQV2 (16) 


Motivated by the pattern in Formulas 15 and 16, we make the following definition. 


DEFINITION 4 


Ifu= (x1, 29, -.., %,) and v= (14, v2, -.., vy) are vectors in 2”, then the dot product (also called the 
Euclidean inner product) of u and v is denoted by y - y and is defined by 


Ul VS hyVy PUQV2 +... kyVy (17) 


In words, to calculate the dot product 
(Euclidean inner product) multiply 
corresponding components and add the 
resulting products. 


EXAMPLE 7 Calculating Dot Products Using Components 


(a) Use Formula 15 to compute the dot product of the vectors u and v in Example S. 
(b) Calculate y - y for the following vectors in 24: 
u=(=-1,3,5,7), vw=(-—3, —4,1,9) 


Solution 
(a) The component forms of the vectors are u = (0, 0, 1) and vy = (0, 2, 2). Thus, 
us v= (0)(0) + (9)(2) + (1) (2) =2 


which agrees with the result obtained geometrically in Example 5. 


ib) u-v=(—1)(—3) + )(—-4) +O) 0) + 7) (00) = 4 


P(t; . Uy, Ua) 


> Hv 1, Uz, U4) 


) 


Figure 3.2.7 


Algebraic Properties of the Dot Product 
In the special case where y — y in Definition 4, we obtain the relationship 
viv=vi 4 vi +...42 = |Iv|I? (18) 


This yields the following formula for expressing the length of a vector in terms of a dot product: 


lvl =yvev (19) 


Dot products have many of the same algebraic properties as products of real numbers. 


THEOREM 3.2.2 


If u, v, and w are vectors in RR” andifkisa scalar, then: 

(a) u:v=v-u [Symmetry property ] 

(b) ur (v-+w)=u-v-+u-w [Distibutive property] 

(c) k(u:v) = (ku)-v [Homogeneity property ] 

(d) v-v>Oandv-v=O0if and onlyif v=0 [Positivity property | 


We will prove parts (c) and (d) and leave the other proofs as exercises. 


Proof (c) Letu= (#1, #3, ...,%,) and v= (v4, v2, ..., Vy). Then 


Atusv)  =k(uyvy +u2v2 +... + &yYy) 
= (kup vy + (kug)v2 +...+ (ku) vy, = (eu) -v 


Proof (d) The result follows from parts (a) and (6) of Theorem 3.2.1 and the fact that 


VO V=Vyyy HVA +... + VnVn = VE EVE +... v4 = III? 
1 2 


The next theorem gives additional properties of dot products. The proofs can be obtained either by expressing 
the vectors in terms of components or by using the algebraic properties established in Theorem 3.2.2. 


THEOREM 3.2.3 


Ifu, v, and ware vectors in”, and if kis a scalar, then: 
(a) 9: v=v-0=0 

(b) Q+v) -w=u'w+v'w 

(c) Ut (Ww) =u-v—-u'w 

(d) (U-V)*w=u'w-v'wW 


(e) K(u-v) =u- (ky) 


We will show how Theorem 3.2.2 can be used to prove part (6) without breaking the vectors into components. 
The other proofs are left as exercises. 


Proof (b) 
(u-+v)-w =w-(u+v) [By symmetry] 
=w-:u+w-v_ [By distributivity] 
=u'w+v-w [By symmetry] 


Formulas 18 and 19 together with Theorems 3.2.2 and 3.2.3 make it possible to manipulate expressions 
involving dot products using familiar algebraic techniques. 


EXAMPLE 8 Calculating with Dot Products 


(u=—2v)-(3u+4v) =u- (3u+4v) —2v- (3u+4v) 
= 3(u-u) +4(u- v) —6(v-u) —8(v- Vv) 
= 3|lull? — 2¢a- ¥) — Biv? 


Cauchy—Schwarz Inequality and Angles in R" 


Our next objective is to extend to 8” the notion of “angle” between nonzero vectors u and v. We will do this 
by starting with the formula 


_ -1 uv 
9= cos (Tan ) a 


which we previously derived for nonzero vectors in 22 and 2%. Since dot products and norms have been 
defined for vectors in 2”, it would seem that this formula has all the ingredients to serve as a definition of the 
angle 0 between two vectors, u and v, in 2”. However, there is a fly in the ointment, the problem being that the 
inverse cosine in Formula 20 is not defined unless its argument satisfies the inequalities 


ee ee 
~ Tully > (2) 


Fortunately, these inequalities do hold for all nonzero vectors in 8” as a result of the following fundamental 
result known as the Cauchy—Schwarz inequality. 


THEOREM 3.2.4 Cauchy—Schwarz Inequality 


Ifu= (21, 43, -.., ¥y) and v= (74, V2, .... ¥y) are vectors in”, then 


ju: v| < |/ul||I¥I] (22) 


or in terms of components 


1/2 1/2 
ivy bug +... + uyvy| < (x? bus +...4 ui (7 pve bo vi | (23) 


We will omit the proof of this theorem because later in the text we will prove a more general version of which 
this will be a special case. Our goal for now will be to use this theorem to prove that the inequalities in 21 hold 
for all nonzero vectors in 2”. Once that is done we will have established all the results required to use Formula 
20 as our definition of the angle between nonzero vectors u and v in R”. 


To prove that the inequalities in 21 hold for all nonzero vectors in R”, divide both sides of Formula 22 by the 
product ||u||||¥|| to obtain 
ju-v] _, : | u:v I< 
<1 or equivalently (—_—W|= 1 
lulivi “ee Tall 


from which 21 follows. 


Hermann Amandus Schwarz (1843-1921) 


f. 


mLN 
Viktor Yakovlevich Bunyakovsky (1804-1889) 


Historical Note The Cauchy—Schwarz inequality is named in honor of the French mathematician 
Augustin Cauchy (see p. 109) and the German mathematician Hermann Schwarz. Variations of this 
inequality occur in many different settings and under various names. Depending on the context in 
which the inequality occurs, you may find it called Cauchy's inequality, the Schwarz inequality, or 
sometimes even the Bunyakovsky inequality, in recognition of the Russian mathematician who 
published his version of the inequality in 1859, about 25 years before Schwarz. 

[Images: wikipedia (Schwarz); wikipedia (Bunyakovsky) | 


Geometry in R" 


Earlier in this section we extended various concepts to 8” with the idea that familiar results that we can 
visualize in R2 and R might be valid in 2” as well. Here are two fundamental theorems from plane geometry 


whose validity extends to R”: 
e The sum of the lengths of two side of a triangle is at least as large as the third (Figure 3.2.8). 
e The shortest distance between two points is a straight line (Figure 3.2.9). 


The following theorem generalizes these theorems to 2”. 


THEOREM 3.2.5 


If u, v, and ware vectors in 2”, and if kis any scalar, then: 
(a) ut vl < [full + Ilvil [ Triangle inequality for vectors | 
(b) (u,v) <d(u, w) +d(w,v) [Triangle inequality for distances] 


Proof (a) 


Ju+vi? = (u+v):(a+vy) = (u-u) + 2(u-v) + (vy) 
= |full?+2¢@u-v) + IIvll? 
< \ju||? + 2|u + ¥| + Talks + Property of absolute value 
Jul]? + 2 [hall wl] + Iv? + Cauchy — Schwarz inequality 
= ((}ul] + IIvil)? 


Proof (b) It follows from part (a) and Formula 11 that 


d(u,v) = = |lu—-v|| = ||u-w) + (w—v)|| 
< |[u= wil + [w= vl] =a (u, w) + dw, v) 


u+v 


|ju + v|] < |Jul] + |} v4] 


Figure 3.2.8 


d(u, v) = d(u, w) + d(w, v) 


Figure 3.2.9 


It is proved in plane geometry that for any parallelogram the sum of the squares of the diagonals is equal to the 
sum of the squares of the four sides (Figure 3.2.10). The following theorem generalizes that result to R”. 


THEOREM 3.2.6 Parallelogram Equation for Vectors 


If u and v are vectors in 2”, then 


ju + v7 + [ha — vl? = 2/Ihull? + Iv?) (24) 
Proof 
||w ++ ¥||? + \ju — vl)? =(u+v):(u+v) + (u—v) +: (u—v) 
=2(u-u) + 2(v°¥) 
= 2 (hull? + Iivil?} 


Figure 3.2.10 


We could state and prove many more theorems from plane geometry that generalize to 2”, but the ones already 
given should suffice to convince you that 2” is not so different from 22 and R7 even though we cannot 


visualize it directly. The next theorem establishes a fundamental relationship between the dot product and norm 
in R”. 


THEOREM 3.2.7 


If u and v are vectors in ®” with the Euclidean inner product, then 
p 


eek | ae eT 
u-v=silu+ vil? — [luv (25) 
Proof 
ut+vil? = (u+v)- (a+) = full? + 2(a- v) + IIvII 
ju—vil? 9 = (u=v)- (a—v) = |full? = 2(a- v) + II ¥II? 


from which 25 follows by simple algebra. 


Note that Formula 25 expresses the dot product 
in terms of norms. 


Dot Products as Matrix Multiplication 


There are various ways to express the dot product of vectors using matrix notation. The formulas depend on 
whether the vectors are expressed as row matrices or column matrices. Here are the possibilities. 


If A is an » sx matrix and u and v are » x | matrices, then it follows from the first row in Table 1 and 
properties of the transpose that 


Au-¥vo = vy? (Au) = (v7 A}u a (4 “v| a =u: Aly 
u- Av = (Av) 7u= (vTAT u=v" (47u}=A7a-v 
The resulting formulas 


Au-v=u- Aly (26) 


u:4Av=Alu-v (27) 


provide an important link between multiplication by an » x » matrix A and multiplication by 47. 


EXAMPLE 9 Verifying ThatAu-v=u-Alv “4 


Suppose that 


Then 


from which we obtain 


1 —2 

A= 2 4 

—1 0 

Au -| 

Aly -| 
Au: v¥ 


u: Aly 


—2 

2|, ¥= 0 

4 5 
—| 7 
2/=] 10 
4 5 
—2 —7 
Oo;=|] 4 
5 —1 


=7(—2) + 10(0) +5(5) =11 
=(=1)(-7) +24) +4(-D=11 


Thus, 4u - vy —u- A? yas guaranteed by Formula 26. We leave it for you to verify that Formula 


27 also holds. 


Form 


uacolumn matrix and 
v acolumn matrix 


uarow matrix andva 
column matrix 


uacolumn matrix and 
v arow matrix 


Dot Product 


u-v=uv=vu 


T 


Table 1 


Example 


wv=[1 


Form Dot Product Example 


u arow matrix and va 
row matrix 


5 
uw’ =[1 —3 5]/4 
0 
1 


va? =[5 4 0]| -3 
5 


A Dot Product View of Matrix Multiplication 


Dot products provide another way of thinking about matrix multiplication. Recall that if A= [a; 7] iS an j93 
matrix and 8 = [4;;] is an p % » matrix, then the i jth entry of AB is 


aby; + aah; +... + indy 
which is the dot product of the ith row vector of A 
[41 @j2 -.. Gi] 
and the jth column vector of B 
ba; 
by 


AP? 


Thus, if the row vectors of A are rj, r3, -.., Fy, and the column vectors of B are ¢1, €3, -.., Cy, then the matrix 


product AB can be expressed as 


Ty°C, Myez ... Ty Cy 
r9°e] rp°e3 ... Fg°¢ 
AB= - 1 2 2 
IC] My C2 .-. Fm‘ ly 


Application of Dot Products to ISBN Numbers 


Although the system has recently changed, most books published in the last 25 years have been 
assigned a unique 10-digit number called an International Standard Book Number or ISBN. The first 
nine digits of this number are split into three groups—the first group representing the country or group 
of countries in which the book originates, the second identifying the publisher, and the third assigned to 
the book title itself. The tenth and final digit, called a check digit, is computed from the first nine digits 
and is used to ensure that an electronic transmission of the ISBN, say over the Internet, occurs without 
error. 


To explain how this is done, regard the first nine digits of the ISBN as a vector b in R”, and let a be the 


(28) 


vector 
a= (1, 2, 3,4, 5, 6,7, 8, 9) 

Then the check digit c is computed using the following procedure: 
1. Form the dot product g - b. 
2. Divide g - h by 11, thereby producing a remainder c that is an integer between 0 and 10, inclusive. 

The check digit is taken to be c, with the proviso that - = ]( is written as X to avoid double digits. 
For example, the ISBN of the brief edition of Calculus, sixth edition, by Howard Anton is 

0—471 =— 15307 =—9 
which has a check digit of 9. This is consistent with the first nine digits of the ISBN, since 
a-b= (1, 2, 3,4, 5, 6, 7, 8, 9) - (0,4, 7, 1, 1,5, 3, 0,7) = 152 

Dividing 152 by 11 produces a quotient of 13 and a remainder of 9, so the check digit is ¢ = 9. If an 
electronic order is placed for a book with a certain ISBN, then the warehouse can use the above 


procedure to verify that the check digit is consistent with the first nine digits, thereby reducing the 
possibility of a costly shipping error. 


Concept Review 


e Norm (or length or magnitude) of a vector 


Unit vector 


Normalized vector 
e Standard unit vectors 


° Distance between points in 2” 


Angle between two vectors in 2” 


° Dot product (or Euclidean inner product) of two vectors in 2” 


Cauchy-Schwarz inequality 


Triangle inequality 

e Parallelogram equation for vectors 

Skills 

° Compute the norm of a vector in 2”. 

° Determine whether a given vector in 8” is a unit vector. 
e Normalize a nonzero vector in 2”. 

¢ Determine the distance between two vectors in 2”. 

* Compute the dot product of two vectors in 2”. 

* Compute the angle between two nonzero vectors in R”. 


e Prove basic properties pertaining to norms and dot products (Theorems 3.2.1—3.2.3 and 3.2.5—3.2.7). 


Exercise Set 3.2 


In Exercises 1—2, find the norm of v, a unit vector that has the same direction as v, and a unit vector that is 
oppositely directed to v. 


1. (a) ¥= (4, —3) 
(b) ¥= (a, ese) 
(c) ¥= ee 


Answer: 
_s v _/4 _3\) __vw _/f_4 3 
© wwi=5. qer= (3-3) ~ Fa (-} 3) 
= Talos ge geb ct=loge cae oF 
m2 aro Ce ie sp i ee 
i ees Os 
(c) |Iv|| = 15, TW Hg eee an a rc Aahekadaa 


2. (a) V=(—5, 12) 
(b) ¥= (1, — 1, 2) 
(c) V=(—2, 3, 3, = 1) 
In Exercises 3-4, evaluate the given expression with u = (2, — 2, 3), v= (1, —3,4), and 
w= (3,6, —4). 
3. (a) |Ju + v]| 
(b) hull + Iv 
(c) || — 2u+ 2v|| 
(d) ||3u—5v + | 


Answer: 


(a) |lu+ vil = 783 

(b) [full + [lvl] = 717 + #26 

(c) ||-2u + 2v|| = 2y3 

(d) || —3u — Sv + wl] = 466 
4. (a) |lu+v +l 

(b) |lu—¥|| 

(c) I]3vll — 3ll¥ll 


(a) [lull — Iv 


In Exercises 5—6, evaluate the given expression withu = (= 2, — 1,4, 5), v= (3, 1, —5, 7), and 
w= (—6,2,1,1). 
5. (a) ||3u—5v+wl 

(b) [[3ull = SI lvl] + [hw 

(c) lla 


Answer: 


(a) ||3u—5v+wl| = y 2570 
(b) ||3ul] — 5]lvl] + [lvl] = 3946 — 10721 + ¥/42 
(c) || — llull¥l] = 2y 966 


6. (a) [full = 2llvll — 3ll¥ll 
(b) Ifull + Il — 2¥ll + Il — 3 


(c) IIllu— vibe] 


7. Let v = ( = 2, 3, 0, 6). Find all scalars & such that |||] = 5. 


Answer: 
a he ee 
k=s,k 5 


8. Let ¥= (1, 1,2, —3, 1). Find all scalars & such that |||] = 4. 


In Exercises 9-10, findu:v, u:u, andy-y. 


9 (a) u= (3, 1,4), v= (2,2, —4) 
(b) u= (1, 1,4, 6), v= (2, —2, 3, — 2) 


Answer: 


(a) U'v= =—8, u:u=26, v-v=24 


(b) u:v=0, u:-u=54, v-v=21 


10. (a) u=(1, 1, — 2,3), v=(—1,0,5, 1) 
(b) u=(2, —1, 1,0, —2), v=(1, 2,2, 2,1) 


In Exercises 11—12, find the Euclidean distance between u and v. 


MW. (a) u = (3, 3, 3), v=(1, 0, 4) 
(b) u=(0, —2, —1, 1), v=(—3, 2,4,4) 
(c) u=(3, —3, —2,0, —3, 13, 5), 
v=(—4,1, —1,5,0, —11,4) 


Answer: 


(a) ju—vl| = 14 
(b) |ju—vl| = 59 
(c) |ju—vl| = (677 


12. (a) u= (1,2, —3,0), v= (5, 1,2, —2) 
(b) u= (2, —1, —4, 1,0, 6, —3, 1), 
v=(-2, —1,0,3,7,2, =—5, 1) 
(c) u=(0, 1, 1, 1,2), v=(2, 1,0, —1, 3) 


13. Find the cosine of the angle between the vectors in each part of Exercise 11, and then state whether the 
angle is acute, obtuse, or 90°. 


Answer: 
15 
(@) cosO= fait ° 0 is acute 
4 
(0) cos#= = “Voy45 0 is obtuse 
(c) cos#= — 136 


(225 180 ; 0 is obtuse 


14. Find the cosine of the angle between the vectors in each part of Exercise 12, and then state whether the 
angle is acute, obtuse, or 90°. 


15. Suppose that a vector a in the xy-plane has a length of 9 units and points in a direction that is 120° 
counterclockwise from the positive x-axis, and a vector b in that plane has a length of 5 units and points in 
the positive y-direction. Find g - h. 


Answer: 
a‘b = 4513 


16. Suppose that a vector a in the xy-plane points in a direction that is 47° counterclockwise from the positive 
x-axis, and a vector b in that plane points in a direction that is 43° clockwise from the positive x-axis. What 
can you say about the value of a - h? 


In Exercises 17—18, determine whether the expression makes sense mathematically. If not, explain why. 


17. (a) u° (v + w) 


(b) U° (v + w) 
(c) llu- ¥ll 
(d) (uv) = |lul| 


Answer: 


18. 


19. 


20. 


21. 


22. 


23. 


(a) u-+ (w+ w) does not make sense because y - w is a scalar. 
(b) u- (¥ ++ w) makes sense. 
(c) ||u- ¥|] does not make sense because the quantity inside the norm is a scalar. 


(d) (u+ v) = ||u|| makes sense since the terms are both scalars. 


(a) [hall - Ill 
(b) (ur v) —w 

(c) Gurwy —& 

(d) &-u 

Find a unit vector that has the same direction as the given vector. 
O34) 

(b) (1, 7) 


(c) (—3, 2, (3) 


(a) (1, 2, 3, 4, 5) 


Answer: 


Find a unit vector that is oppositely directed to the given vector. 
(a) (— 12, =—5) 

(b) G, = 3, =3) 

(c) (= 6, 8) 


(d) (=, 1, 6. 3) 


State a procedure for finding a vector of a specified length m that points in the same direction as a given 
vector v. 


ae a 4 


If ||¥|| = 2 and ||| = 3, what are the largest and smallest values possible for ||w — w||? Give a geometric 
explanation of your results. 


Find the cosine of the angle 0 between u and v. 
(a) u= (2,3), v= (5, —7) 

(b) u=(-—6, —2), v= (4, 0) 

(c) w= (1, —5,4), v=, 3, 3) 


(d) u=(—2, 2,3), v=(1, 7, —4) 


Answer: 

(a) cosA#= = a 9s 
y 962 

(>) cos@= — 
y10 

(c) cos#=0 

(d) cos#=0 


24. Find the radian measure of the angle 0 (with 0 <  < mz) between u and v. 
(a) (1, —7) and (21, 3) 
(b) (0, 2) and (3, — 3) 
(c) (—1, 1,0) and (0, = 1, 1) 
(d) (1, = 1, 0) and (1, 0, 0) 


In Exercises 25—26, verify that the Cauchy-Schwarz inequality holds. 
25. (a) u= (3,2), ¥=(4, —1) 


(b) u=(—3, 1,0), v= (2, =—1, 3) 
(c) u=(0, 2, 2,1), v=(1, 1,1, 1) 


Answer: 


(a) ju v]=10, |fulllivll = ¥13y'17 = 14.866 
(b) ju: ¥|=7, |fulllivll = ¥10y'14 ~ 11.832 
(c) fur ¥|=5, |lullllvll = (3)(2) =6 

26. (a) u= (4, 1,1), v=(1, 2, 3) 
(6) a= (1, 2, 1, 2,3), v= (0, 1, 1,5, —2) 
(c) a=(1,3, 5,2, 0, 1), v= (0, 2, 4, 1, 3, 5) 


27. Let py = (xg, vg, Zg) 20d p = (x, y, z). Describe the set of all points (x, y, z) for which ||p — po|| = 1. 
Answer: 


A sphere of radius | centered at (x9, yp, zg)- 


28. (a) Show that the components of the vector v = (v1, v2) in Figure Ex-28a are v1 = ||¥||cos # and 
v2 = ||v||sin @. 
(b) Let u and v be the vectors in Figure Ex-28b. Use the result in part (a) to find the components of 
4u — 5y. 


(a) 


Figure Ex-28 


29. Prove parts (a) and (6) of Theorem 3.2.1. 
30. Prove parts (a) and (c) of Theorem 3.2.3. 
31. Prove parts (d) and (e) of Theorem 3.2.3. 


32. Under what conditions will the triangle inequality (Theorem 3.2.5a) be an equality? Explain your answer 
geometrically. 


33. What can you say about two nonzero vectors, u and v, that satisfy the equation ||u ++ v|] = ||ul] + ||¥||? 


34. (a) What relationship must hold for the point p = (a, 4, c) to be equidistant from the origin and the 
xz-plane? Make sure that the relationship you state is valid for positive and negative values of a, b, and 
Gs 


(b) What relationship must hold for the point p = (a, 4, ¢) to be farther from the origin than from the 
xz-plane? Make sure that the relationship you state is valid for positive and negative values of a, b, and 
c 


True-False Exercises 
In parts (a)-(j) determine whether the statement is true or false, and justify your answer. 
(a) If each component of a vector in 27 is doubled, the norm of that vector is doubled. 


Answer: 


True 
(b) In 22, the vectors of norm 5 whose initial points are at the origin have terminal points lying on a circle of 


radius 5 centered at the origin. 
Answer: 


True 


(c) Every vector in 2” has a positive norm. 
Answer: 


False 


(d) If v is a nonzero vector in 8”, there are exactly two unit vectors that are parallel to v. 
Answer: 


True 


(e) If ||u|| = 2, ||¥|| = 1, and y - y = 1, then the angle between u and vis z / 3 radians. 
Answer: 


True 


(f) The expressions (u- v) ++ wand u- (v ++ w) are both meaningful and equal to each other. 
Answer: 


False 


(g) Ifu-v—=u-w, thenv=w. 
Answer: 


False 


(h) If y - y = 0, then either y —Q or y =O. 
Answer: 


False 


(i) In p2, if u lies in the first quadrant and v lies in the third quadrant, then y - y cannot be positive. 
Answer: 


True 


(j) For all vectors u, v, and w in 2”, we have 


|[U + ¥ + w= [fal] + [vl] + [hel 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


3.3 Orthogonality 


In the last section we defined the notion of “angle” between vectors in 2”. In this section we will focus on the notion of 
“perpendicularity.” Perpendicular vectors in 8” play an important role in a wide variety of applications. 


Orthogonal Vectors 


Recall from Formula 20 in the previous section that the angle 0 between two nonzero vectors u and v in &” is defned by the 


formula 


-l uv 
A=cos (Gane) 
[rallivll 


It follows from this that # — x / 2 if and only if y - y = 0. Thus, we make the following definition. 


DEFINITION 1 


Two nonzero vectors u and v in 8” are said to be orthogonal (or perpendicular) if y - y — 0. We will also agree that the 
zero vector in 2” is orthogonal to every vector in R”. A nonempty set of vectors in 8” is called an orthogonal set if all 
pairs of distinct vectors in the set are orthogonal. An orthogonal set of unit vectors is called an orthonormal set. 


EXAMPLE 1 Orthogonal Vectors << 


(a) Show that u= ( — 2, 3, 1,4) and w= (1, 2, 0, — 1) are orthogonal vectors in p4. 


(b) Show that the set S= {i, j, k} of standard unit vectors is an orthogonal set in p7. 


Solution 
(a) The vectors are orthogonal since 
usv=(=2)(1) + (3)(2) + (1) ©) + 4)(=— 1) = 0 
(b) We must show that all pairs of distinct vectors are orthogonal, that is, 
i-j=i-k=j-k=0 
This is evident geometrically (Figure 3.2.2), but it can be seen as well from the computations 

i-j= (1, 0,0)- (0,1,0)=0 
i-k=(1,0,0)-(0,0,1)=0 
j-k=(0,1,0)- (0,0,1)=0 


In Example | there is no need to check that 
j-i=k-i=k-j=0 

since this follows from computations in the example and 

the symmetry property of the dot product. 


Lines and Planes Determined by Points and Normals 


One learns in analytic geometry that a line in R2 is determined uniquely by its slope and one of its points, and that a plane in P? is 
determined uniquely by its “inclination” and one of its points. One way of specifying slope and inclination is to use a nonzero 
vector n, called a normal, that is orthogonal to the line or plane in question. For example, Figure 3.3.1 shows the line through the 
point Pg (xg, yg) that has normal n= (a, 4) and the plane through the point Pp (xg, yg, zg) that has normal n = (a, , c). Both 
the line and the plane are represented by the vector equation 


n- PoP =0 (1) 


where P is either an arbitrary point (x, y) on the line or an arbitrary point (x, y, z) in the plane. The vector PoP can be expressed 


in terms of components as 


PoP =(x—x0, y—yo) [line] 


PoP =(x=x0, y—yo, 2-20) [plane] 


a(x—x9) ++b(Q—yo)=9 [line] (2) 


a(x—x9) +4(y—yo) +e@—z9)=0 [plane] (3) 
These are called the point-normal equations of the line and plane. 


EXAMPLE 2 Point-Normal Equations << 


It follows from 2 that in 22 the equation 
6(x=—3) + (y+7)=0 
represents the line through the point (3, — 7) with normal n= (6, 1); and it follows from 3 that in 2? the equation 
4(x = 3) + 2y —5(2=-7) =0 
represents the plane through the point (3, 0, 7) with normal n= (4, 2, —5). 


(a, b,c) 


~ 
~ 
¢ 


rr og 
n 


PolX: Yor Zp) y 


Figure 3.3.1 


When convenient, the terms in Equations 2 and 3 can be multiplied out and the constants combined. This leads to the following 
theorem. 


THEOREM 3.3.1 


(a) If aand } are constants that are not both zero, then an equation of the form 


ax+by +c=0 (4) 


represents a line in R2 with normal n= (a, b). 


(b) If a, b, and c are constants that are not all zero, then an equation of the form 
ax+by +ez +d =0 (5) 


represents a plane in 22 with normal n= (a, b,c). 


EXAMPLE 3 Vectors Orthogonal to Lines and Planes Through the Origin 


(a) The equation ax +-by = 0 represents a line through the origin in 22. Show that the vector ny = (a, #) formed 
from the coefficients of the equation is orthogonal to the line, that is, orthogonal to every vector along the line. 
(b) The equation ax +-y + ez = 0 represents a plane through the origin in 27. Show that the vector nz = (a, 4, c) 


formed from the coefficients of the equation is orthogonal to the plane, that is, orthogonal to every vector that 
lies in the plane. 


Solution We will solve both problems together. The two equations can be written as 
(a,5)- (x,y) =90 and (a, 4,c¢)+ (x,y,z) =0 
or, alternatively, as 
ny: (x,y) =0 and ng: (x,y,z) =0 
These equations show that nj is orthogonal to every vector (x, y) on the line and that nz is orthogonal to every 
vector (x, y, z) in the plane (Figure 3.3.1). 


Recall that 
ax+by=0 and ax+hy+ez=0 


are called homogeneous equations. Example 3 illustrates that homogeneous equations in two or three unknowns can be written in 
the vector form 


n-x=0 (6) 


where n is the vector of coefficients and x is the vector of unknowns. In P? this is called the vector form of a line through the 
origin, and in 2? it is called the vector form of a plane through the origin. 


Referring to Table 1 of Section 3.2, in what other ways 
can you write 6 ifn and x are expressed in matrix form? 


Orthogonal Projections 


In many applications it is necessary to “decompose” a vector u into a sum of two terms, one term being a scalar multiple of a 
specified nonzero vector a and the other term being orthogonal to a. For example, if u and a are vectors in 22 that are positioned 


so their initial points coincide at a point Q, then we can create such a decomposition as follows (Figure 3.3.2): 
e Drop a perpendicular from the tip of u to the line through a. 


¢ Construct the vector W1 from Q to the foot of the perpendicular. 


° Construct the vector Wz = u— WY}. 


Q A Q 
(a) (b) 


(d) 


Figure 3.3.2 In parts (b) through (d), u = wy ++ W2, where W is parallel to a and 2 is orthogonal to a. 


Since 
wy wz =w, + (u=—w) =u 


we have decomposed u into a sum of two orthogonal vectors, the first term being a scalar multiple of a and the second being 
orthogonal to a. 


The following theorem shows that the foregoing results, which we illustrated using vectors in 22, apply as well in 2”. 


THEOREM 3.3.2 Projection Theorem 


If u and a are vectors in”, and ifa#0, then u can be expressed in exactly one way in the form u = wy ++ w2, where 


Wy is a scalar multiple of a and W3 is orthogonal to a. 


Proof Since the vector Wj is to be a scalar multiple of a, it must have the form 


wi =ka (7) 
Our goal is to find a value of the scalar k and a vector W3 that is orthogonal to a such that 
u=W, + W2 (8) 


We can determine & by using 7 to rewrite 8 as 
u=w + Ww) =ka+w 


and then applying Theorems 3.2.2 and 3.2.3 to obtain 
u:a=(ka+w) -a=éllall? + (w2- a) (9) 


Since W3 is to be orthogonal to a, the last term in 9 must be 0, and hence & must satisfy the equation 
u:a=4<|| al? 
from which we obtain 
k ot ura 
2 
I|al| 
as the only possible value for k. The proof can be completed by rewriting 8 as 


u:a 
\ 


Ww =u-w, =u—ka=u— a 


lla 
and then confirming that W732 is orthogonal to a by showing that w3 - a = 0 (we leave the details for you). 
The vectors Wj and W3 in the Projection Theorem have associated names—the vector Wj is called the orthogonal projection of u 


on a or sometimes the vector component of u along a, and the vector W32 is called the vector component of u orthogonal to a. The 
vector W1 is commonly denoted by the symbol proj,u, in which case it follows from 8 that w3 = u — projqu. In summary, 


projgu = ue : a (vectar component of ualong a) (10) 
Ilall 

u— proju =u— a = a (vector component of uorthogonal to a) (11) 
|all 


EXAMPLE 4 Orthogonal ProjectiononaLine <« 


Find the orthogonal projections of the vectors ey = (1, 0) and ez = (0, 1) on the line Z that makes an angle 8 with 
the positive x-axis in R2. 


Solution As illustrated in Figure 3.3.3, a = (cos @, sin @) is a unit vector along the line L, so our first problem is 
to find the orthogonal projection of @; along a. Since 


llall = ysin2@ 4+ cos*9=1 and ey-a=(1,0)- (cos, sin#) =cos 0 
it follows from Formula 10 that this projection is 


projgey = “lta = (cos #)(cos #, sn #) = (cos, sin cos a) 
I|all 


Similarly, since e7 - a= (0, 1) « (cos 6, sin #) = sin @, it follows from Formula 10 that 


projgez = “2h = (sin 8) (cos 6, sin) = (sin 8, cos asin’) 
a 


EXAMPLE 5 Vector ComponentofuAlonga 


Letu= (2, = 1, 3) anda= (4, — 1, 2). Find the vector component of u along a and the vector component of u 
orthogonal to a. 


Solution 
u-a = (2)(4) +(—1)(-1) + 3) (2) = 15 
lla? =47+(-1)7+27=21 

Thus the vector component of u along a is 


- u-a, ey _/f20 _5 10 
projgu = ial? DT (4, 1.2)=(4 a, 


and the vector component of u orthogonal to a is 


u—proj,u= (2, —1,3)— Ge -3, >)= (“3 -2, 7) 


As a check, you may wish to verify that the vectors u — proj,u and a are perpendicular by showing that their dot 
product is zero. 


Figure 3.3.3 


Sometimes we will be more interested in the norm of the vector component of u along a than in the vector component itself. A 
formula for this norm can be derived as follows: 


: u:a u'a usa 
Iiprojgul] = | 22 al] = | 8-2 |j/aj) = 2 -2L jay 
lal lal lal 


where the second equality follows from part (c) of Theorem 3.2.1 and the third from the fact that || al| 2 0. Thus, 


* ura 
llprojgul] = ae (12) 


If @ denotes the angle between u and a, then u - a = ||u||||al| cos #, so 12 can also be written as 
I|projgul] = ||ul||cos 4| (13) 


(Verify.) A geometric interpretation of this result is given in Figure 3.3.4. 


|lul]| cos 8 
(a) 0<0<2 


|lul| cos 8 


(b) Zet<n 


Figure 3.3.4 


The Theorem of Pythagoras 


In Section 3.2 we found that many theorems about vectors in 22 and 2? also hold in 2”. Another example of this is the following 
generalization of the Theorem of Pythagoras (Figure 3.3.5). 


THEOREM 3.3.3 Theorem of Pythagoras in R” 


If u and y are orthogonal vectors in R” with the Euclidean inner product, then 


2 2 2 
Iu + ¥[l° = |full” + [vl Oe 


Proof Since u and v are orthogonal, we have y - y = 0, from which it follows that 


Ju + v||? = (a+ v) + (u+v) = full? + 2(a- v) + [full? + [vl 


EXAMPLE 6 Theorem of Pythagoras in Ri < 
We showed in Example | that the vectors 
u=(=—2,3,1,4) and v=(1,2,0, =—1) 

are orthogonal. Verify the Theorem of Pythagoras for these vectors. 
Solution We leave it for you to confirm that 

u+v= (1,5, 1, 3) 

lu + v||? = 36 

lull? + Ivll? = 30 + 6 
Thus, [fu + wll? = |hull? + Ilvil? 


uty 


Figure 3.3.5 


OPTIONAL 
Distance Problems 


We will now show how orthogonal projections can be used to solve the following three distance problems: 
Problem 1. Find the distance between a point and a line in p2. 
Problem 2. Find the distance between a point and a plane in 23. 


Problem 3. Find the distance between two parallel planes in 22. 


A method for solving the first two problems is provided by the next theorem. Since the proofs of the two parts are similar, we will 
prove part (4) and leave part (a) as an exercise. 


THEOREM 3.3.4 


(a) In R? the distance D between the point Po(xo, Yo) and the line gx + dy +e =0 18 


_ jaxotbya te} 
D= 7 (15) 
Yar+b 


(b) In R? the distance D between the point Po(xo, Yo, Z0) and the plane gx -+- by +ez+d =O is 


p= axg + byg + ezg + a| 


16 
a? 4b? 4. o (16) 


Proof (b) Let O¢x4, 4,21) be any point in the plane. Position the normal n = (a, 4, c) so that its initial point is at Q. As 
illustrated in Figure 3.3.6, the distance D is equal to the length of the orthogonal projection of OP, on n. Thus, it follows from 
Formula 12 that 


- OPy-n 
D= |lproj,CPql| = 
Ilp ne all iin] 
But 
OPy = (x9 —*1, YO —¥1,20 —21) 
OP: n=a(xp—%1) +400 —¥1) +¢@0 —21) 
IInl] = fa? +b7 40? 
Thus 


_ ja@o—*1) +200-¥1) +¢@0—21)| 
— 17 
y rE re (17) 
a+ b*+e 
Since the point O(x1, v4, z,) lies in the given plane, its coordinates satisfy the equation of that plane; thus 
ax, by; +ez, +a =0 
or 
d= —ax, —dy, —cz1 
Substituting this expression in 17 yields 16. 


EXAMPLE 7 Distance Betweena Point andaPlane 
Find the distance D between the point (1, —4, — 3) and the plane 2x — 3y 4. 6z= —1. 


Solution Since the distance formulas in Theorem 3.3.4 require that the equations of the line and plane be written 
with zero on the right side, we first need to rewrite the equation of the plane as 


2x — 3y + 6z+1=0 
from which we obtain 
D (201) + (= 3)(—4) + 6( = 3) 4 1) = ae) =3 


27 4. (3)? 4.62 7 


n 
Pf Xq. Yor Zp) 


proj, OP, 4 


Figure 3.3.6 


The third distance problem posed above is to find the distance between two parallel planes in 23. As suggested in Figure 3.3.7, the 


distance between a plane V and a plane W can be obtained by finding any point Pp in one of the planes, and computing the 
distance between that point and the other plane. Here is an example. 


Ww 


Figure 3.3.7 The distance between the parallel planes V and W is equal to the distance between Py and W. 


EXAMPLE 8 Distance Between Parallel Planes 


The planes 

x 2y — 2z = 3 and 2x + 4y —4z=7 
are parallel since their normals, (1, 2, — 2) and (2,4, —4), are parallel vectors. Find the distance between these 
planes. 


Solution To find the distance D between the planes, we can select an arbitrary point in one of the planes and 
compute its distance to the other plane. By setting » = z= 0) in the equation x 4. 2y — 2z = 3, we obtain the point 
Pg (3, 0, 0) in this plane. From 16, the distance between Pg and the plane 2x 4. 4y —4z = 7 is 


2(3) +4(0) + (=—4)(0) -7 1 
D= 6 
P44 f=)? 


Concept Review 


e 


e 


e 


e 


Orthogonal (perpendicular) vectors 
Orthogonal set of vectors 

Normal to a line 

Normal to a plane 

Point-normal equations 

Vector form of a line 

Vector form of a plane 

Orthogonal projection of u on a 
Vector component of u along a 

Vector component of u orthogonal to a 


Theorem of Pythagoras 


Skills 


Determine whether two vectors are orthogonal. 

Determine whether a given set of vectors forms an orthogonal set. 

Find equations for lines (or planes) by using a normal vector and a point on the line (or plane). 
Find the vector form of a line or plane through the origin. 


Compute the vector component of u along a and orthogonal to a. 


e Find the distance between a point and a line in p2 or p?. 
° Find the distance between two parallel planes in 23. 


e Find the distance between a point and a plane. 


Exercise Set 3.3 


In Exercises 1—2, determine whether u and v are orthogonal vectors. 


1. (a) u= (6, 1,4), v= (2,0, —3) 
(b) u= (0,9, — 1), v= (1, 1, 1) 
(c) u=(—6, 0,4), v= (3, 1, 6) 
(d) u= (2,4, — 8), v= (5, 3,7) 


Answer: 


(a) Orthogonal 

(b) Not orthogonal 
(c) Not orthogonal 
(d) Not orthogonal 


2.(q) u= (2, 3), v= (5, —7) 
(b) u=(—6, —2), v= (4, 0) 
(c) u= (1, —5, 4), v= (3, 3, 3) 
(d) u=(—2, 2, 3), v= (1,7, -4) 


In Exercises 3-4, determine whether the vectors form an orthogonal set. 


3-(@) vy = (2, 3), ¥2 = (3, 2) 
(b) vy =(=1, 1), v2 = (1, 1) 
(c) vy = (=2, 1, 1), v2 = C1, 0, 2), vg = (= 2, —5, 1) 
(d) vy = (=—3,4, = 1), v9 = (1, 2, 5), v3 = (4, — 3, 0) 


Answer: 


(a) Not an orthogonal set 
(b) Orthogonal set 
(c) Orthogonal set 
(d) Not an orthogonal set 


*@ v1 = (2,3), ¥2=(—3, 2) 
(o) vi = (1, —2), v9 =(—2, 1) 
(c) vy = (1, 0, 1), v2 = (1, 1, 1), v3 = (-— 1,0, 1) 
(d) vy = (2, —2, 1), v2 = (2, 1, — 2), vg = (1, 2, 2) 
5. Find a unit vector that is orthogonal to bothu = (1, 0, 1) andvw= (0, 1, 1). 


Answer: 


heb # 


6. (a) Show that v = (a, b) and w= ( — 4, a) are orthogonal vectors. 


(b) Use the result in part (a) to find two vectors that are orthogonal to v= (2, — 3). 


(c) Find two unit vectors that are orthogonal to ( — 3, 4). 


7. Do the points A(1, 1, 1), 8( — 2, 0, 3), and C( — 3, — 1, 1) form the vertices of a right triangle? Explain your answer. 


Answer: 


Yes 


8. Repeat Exercise 7 for the points A(3, 0, 2), 8(4, 3, 0), and C(8, 1, = 1). 


In Exercises 9-12, find a point-normal form of the equation of the plane passing through P and having n as a normal. 


9, P(=— 1,3, —2), n=(—2, 1, = 1) 
Answer: 


—2(x + 1) + (vy —3) -—@+2)=0 
10. ?(1, 1,4), n= (1, 9, 8) 
11. (2, 0, 0); n= (0, 0, 2) 

Answer: 


2z=0 
12. P(0, 0, 0); n= (1, 2, 3) 


In Exercises 13-16, determine whether the given planes are parallel. 


13. 4x — y + 2z = 5 and 7x — 3y + 4z=8 


Answer: 

Not parallel 
14.5% —4y — 3z—2=0 and 3x — 12y — 92-7 =0 
15. ay = 8x — 4z+ Sand x= 32+ rt 

Answer: 

Parallel 


16. (—4, 1, 2) - (x,y,z) =O and (8, —2, —4)- (x,y,z) =0 
In Exercises 17—18, determine whether the given planes are perpendicular. 
17.3x —y +z-4=0, x4+2z7= —1 
Answer: 
Not perpendicular 
18.x—2y + 3z=4, —2x+5y+4z= =1 
In Exercises 19-20, find ||projgull. 


19. (a) u=(1, —2), a=(-4, -—3) 
(b) u= (3, 0,4), a= (2, 3, 3) 


20. (a) u= (5, 6), a= (2, = 1) 

(b) u= (3, —2, 6), a= (1, 2, —7) 
In Exercises 21—28, find the vector component of u along a and the vector component of u orthogonal to a. 
21,u= (6,2), a= (3, — 9) 

Answer: 


(0, 0) (6, 2) 
22,u=(=—1, —2), a=(—2, 3) 
23,u= (3,1, —7), a= (1, 9, 5) 


Answer: 


24,u=(1, 0,0), a= (4, 3,8) 
25,u=(1, 1, 1),a=(0,2, —1) 


Answer: 
2_1 3 6 
0. -3}. 0.88) 
26, u= (2, 0, 1), a= (1, 2, 3) 
27,u= (2,1, 1,2),a= (4, —4, 2, —2) 


Answer: 
A ek SE ee Nae Se cote, ed 
2, 5’ 10’ 10" 45° 5’ 10’ 10 
28,u= (5, 0, = 3, 7),a= (2,1, =1, =1) 
In Exercises 29-32, find the distance between the point and the line. 
29. 4x + 3y +4=0; (=—3, 1) 
Answer: 


1 
30.x =—3y +2=0; (=1,4) 
31. y = —4x + 2; (2, —5) 


Answer: 


1 


y17 
32. 3x + »y =5, (1, 8) 


In Exercises 33-36, find the distance between the point and the plane. 


33. (3, 1, —2), x + 2y —2z=4 
Answer: 


B 


3 
34.(—1, —1, 2), 2x + 5y —6z2=4 
35. (= 1, 2, 1), 2x + 3y —4z= 1 


Answer: 


29 
36. (0,3, —2); x=y—z=3 


In Exercises 37-40, find the distance between the given parallel planes. 


37. 2x —y —z= Sand —4x + 2y + 2z7= 12 
Answer: 


1 


y6 
38. 3x —4y +2z= 1 and 6x — 8y + 22 =3 
39. —4x + y — 3z=0 and 8x — 2y + 6z=0 


Answer: 


0 (The planes coincide.) 
40. 2x —y +z=1and2x—y+4+z= —1 


41. Let i, j, and k be unit vectors along the positive x, y, and z axes of a rectangular coordinate system in 3-space. If v = (a, 4, c) 
is anonzero vector, then the angles a, B, and y between v and the vectors i, j, and k, respectively, are called the direction 
angles of v (Figure Ex-41), and the numbers cos a, cos 3, and Cos ¥ are called the direction cosines of v. 


(a) Show that cosa=a! ||v]]. 
(b) Find cos 9 and cos ¥. 
(c) Show that v / ||¥|| = (cos a, cos 9, cos y). 


(d) Show that cos“a + cos*8 + cos? 


y= 1. 


Figure Ex-41 


Answer: 


(b) cos @= ae, cos y= 
¥|| 


42. Use the result in Exercise 41 to estimate, to the nearest degree, the angles that a diagonal of a box with dimensions 


a 
Ill 


10 ern x 15 crn x 25 cm makes with the edges of the box. 
43. Show that if v is orthogonal to both W1 and 9, then v is orthogonal to kywy + £2w? for all scalars ky and k3. 


44, Let u and v be nonzero vectors in 2- or 3-space, and let & = ||u|| and ? = ||v||. Show that the vector w= ju + kv bisects the 
angle between u and v. 


45. Prove part (a) of Theorem 3.3.4. 


46. Is it possible to have 
projau = projga ? 


Explain your reasoning. 
True-False Exercises 
In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 
(a) The vectors (3, = 1, 2) and (0, 0, 0) are orthogonal. 

Answer: 


True 


(b) If u and v are orthogonal vectors, then for all nonzero scalars & and m, joy and jy are orthogonal vectors. 
Answer: 


True 


(c) The orthogonal projection of u along a is perpendicular to the vector component of u orthogonal to a. 
Answer: 


True 
(d) If a and b are orthogonal vectors, then for every nonzero vector u, we have 
proja(proj,(u)) = 0 


Answer: 


True 
(e) If a and u are nonzero vectors, then 
proja(proj,(u)) = projg(u) 


Answer: 


True 
(f) If the relationship 
projgl = projy¥ 


holds for some nonzero vector a, then y = y. 
Answer: 


False 


(g) For all vectors u and vy, it is true that 
Iu + ¥|] = [full + II¥ll 


Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


3.4 The Geometry of Linear Systems 


In this section we will use parametric and vector methods to study general systems of linear equations. This work will enable us to interpret 
solution sets of linear systems with n unknowns as geometric objects in 2” just as we interpreted solution sets of linear systems with two 
and three unknowns as points, lines, and planes in 22 and R2. 


Vector and Parametric Equations of Lines in R? and R® 


In the last section we derived equations of lines and planes that are determined by a point and a normal vector. However, there are other 
useful ways of specifying lines and planes. For example, a unique line in R2 or R? is determined by a point Xg on the line and a nonzero 
vector v parallel to the line, and a unique plane in p? is determined by a point xg in the plane and two noncollinear vectors ¥j and v2 
parallel to the plane. The best way to visualize this is to translate the vectors so their initial points are at Xg (Figure 3.4.1). 


Figure 3.4.1 


Let us begin by deriving an equation for the line Z that contains the point Xg and is parallel to v. If x is a general point on such a line, then, 
as illustrated in Figure 3.4.2, the vector x — xg will be some scalar multiple of v, say 
x — Xp = fv or equivalently x = xg + fv 


As the variable t (called a parameter) varies from —gg tO po, the point x traces out the line L. Accordingly, we have the following result. 


THEOREM 3.4.1 


Let L be the line in R? or R? that contains the point xg and is parallel to the nonzero vector y. Then the equation of the line through 


Xg that is parallel to y is 
xX=Xxg+iv (1) 
If xg = 0, then the line passes through the origin and the equation has the form 


x=iv (2) 


Although it is not stated explicitly, it is understood in 
Formulas | and 2 that the parameter ¢ varies from —gg to po. 
This applies to all vector and parametric equations in this text 
except where stated otherwise. 


Figure 3.4.2 


Vector and Parametric Equations of Planes in R? 


Next we will derive an equation for the plane W that contains the point Xg and is parallel to the noncollinear vectors ¥1 and ¥3. As shown in 
Figure 3.4.3, if x is any point in the plane, then by forming suitable scalar multiples of V1 and V2, say ¢;v4 and ¢3¥3, we can create a 
parallelogram with diagonal xX — Xg and adjacent sides ¢;v, and £7v3. Thus, we have 

X—Xg =f, ¥] + f£9v2 or equivalently x = xg + £1¥1 + fov2 


Figure 3.4.3 


As the variables ¢; and £3 (called parameters) vary independently from —go tO go, the point x varies over the entire plane W. Accordingly, 


we make the following definition. 


THEOREM 3.4.2 


Let W be the plane in 2? that contains the point Xg and is parallel to the noncollinear vectors ¥1 and V2. Then an equation of the 


plane through xg that is parallel to V1 and V3 is given by 


X=Xg + fyVvy + tov? (3) 


If xg = 0, then the plane passes through the origin and the equation has the form 


X=f,V¥1 + fov2 (4) 


Remark Observe that the line through xg represented by Equation 1 is the translation by xg of the line through the origin represented by 
Equation 2 and that the plane through xg represented by Equation 3 is the translation by xg of the plane through the origin represented by 


Equation 4 (Figure 3.4.4). 


XN=Npy +t,V, + LV, 


Figure 3.4.4 


Motivated by the forms of Formulas 1 to 4, we can extend the notions of line and plane to R” by making the following definitions. 


DEFINITION 1 


If Xg and v are vectors in ®”, and if v is nonzero, then the equation 
x=Xxgp+iv (5) 


defines the line through XQ that is parallel to y. In the special case where xq = 0, the line is said to pass through the origin. 


DEFINITION 2 


If xg, ¥1, and V2 are vectors in 2”, and if v1 and V2 are not collinear, then the equation 
X=Xg + fyVvy + £92 (6) 


defines the plane through xg that is parallel to ¥\ and ¥3. In the special case where xg =O, the plane is said to pass through the 
origin. 


Equations 5 and 6 are called vector forms of a line and plane in 2”. If the vectors in these equations are expressed in terms of their 
components and the corresponding components on each side are equated, then the resulting equations are called parametric equations of 
the line and plane. Here are some examples. 


EXAMPLE 1 Vector and Parametric Equations of Lines in R*andR® <4 


(a) Find a vector equation and parametric equations of the line in 2? that passes through the origin and is parallel to the 
vector ¥ = ( = 2, 3). 

(b) Find a vector equation and parametric equations of the line in 2? that passes through the point Pq(1, 2, — 3) and is 
parallel to the vector v= (4, —5, 1). 


(c) Use the vector equation obtained in part (b) to find two points on the line that are different from Pp. 


Solution 
(a) It follows from 5 with xg = 0 that a vector equation of the line is x = gy. If we let x = (x, y), then this equation can be 
expressed in vector form as 
(x, y) =é(— 2, 3) 


Equating corresponding components on the two sides of this equation yields the parametric equations 


x= =2t, y=3t 


(b) It follows from 5 that a vector equation of the line is x = xg ++ fv. If we let x = (x, y, z), and if we take 
xg = (1, 2, — 3), then this equation can be expressed in vector form as 


(x,y,z) = (1, 2, =—3) +2(4, =5, 1) (7) 


Equating corresponding components on the two sides of this equation yields the parametric equations 
x=1+4+4, y=2—5t, z= —3+¢ 


(c) A point on the line represented by Equation 7 can be obtained by substituting a specific numerical value for the 
parameter ¢. However, since ¢ = () produces (x, y,z) = (1, 2, —3), which is the point Pg, this value of ¢ does not serve 
our purpose. Taking ¢ = | produces the point (5, — 3, — 2) and taking ¢ = — | produces the point ( — 3,7, —4). Any 
other distinct values for ¢ (except ¢ = 0) would work just as well. 


EXAMPLE 2 Vector and Parametric Equations of a Plane in R < 


Find vector and parametric equations of the plane x — y 4. 2z7= 5. 


Solution We will find the parametric equations first. We can do this by solving the equation for any one of the variables in 
terms of the other two and then using those two variables as parameters. For example, solving for x in terms of y and z yields 


x=5+y—2z (8) 


and then using y and z as parameters ¢; and ¢3, respectively, yields the parametric equations 
x=5+¢, —2f2, y=, z=£2 


We would have obtained different parametric and 
vector equations in Example 2 had we solved 8 for y or 
z rather than x. However, one can show the same plane 
results in all three cases as the parameters vary from 
=o [0 ao. 


To obtain a vector equation of the plane we rewrite these parametric equations as 
(%,¥,2) =O +41 = 262, £1, £2) 
or, equivalently, as 


(x, ¥,z) = (5, 0, 0) + 2,1, 1, 0) + £9( —2, 0, 1) 


EXAMPLE 3 Vector and Parametric Equations of Lines and Planes in Rt 


(a) Find vector and parametric equations of the line through the origin of 24 that is parallel to the vector vy = (5, — 3, 6, 1). 


(b) Find vector and parametric equations of the plane in p4 that passes through the point xg = (2, — 1, 0, 3) and is parallel 
to both vy = (1, 5, 2, —4) and vz = (0, 7, —8, 6). 


Solution 
(a) If we let x= (x1, x2, x3, x4), then the vector equation x — gy can be expressed as 
(x4, %9, %3, x4) =£(5, — 3, 6, 1) 
Equating corresponding components yields the parametric equations 
xy=5t, xg= —3t, x3=64, xq=t 


(b) The vector equation x = xg + £1v1 + £2¥3 can be expressed as 
(x1, %9, %3, X4) = (2, — 1,0, 3) +2101, 5, 2, —4) +29(0, 7, —8, 6) 


which yields the parametric equations 


xyp=24+h; 
x2= — 1452] + 7é2 
X3 = 2; — Bt 


x4=3—41 + 6f2 


Lines Through Two Points in R” 


If xg and Xj are distinct points in 2”, then the line determined by these points is parallel to the vector ¥ = Xj — Xg (Figure 3.4.5), so it 
follows from 5 that the line can be expressed in vector form as 


x= xp + £(x1 —xg) (9) 
or, equivalently, as 
x= (1—£)xpg + fx, (10) 
These are called the two-point vector equations of a line in R”. 
EXAMPLE 4 ALine Through Two Points in Re <4 
Find vector and parametric equations for the line in 2 that passes through the points P(0, 7) and Q(5, 0). 


Solution We will see below that it does not matter which point we take to be Xg and which we take to be Xj, so let us 
choose xg = (0, 7) and xj = (5, 0). It follows that xy —xg = (5, —7) and hence that 


(x,y) = (0,7) +265, —7) (11) 


which we can rewrite in parametric form as 
x=5t, y= 7-7 


Had we reversed our choices and taken xg = (5, 0) and x; = (0, 7), then the resulting vector equation would have been 
(x, ¥) = (5, 0) +40 =—5, 7) (12) 


and the parametric equations would have been 
x=5-5t,y=7t 
(verify). Although 11 and 12 look different, they both represent the line whose equation in rectangular coordinates is 
Tx + 5y = 35 


(Figure 3.4.6). This can be seen by eliminating the parameter ¢ from the parametric equations (verify). 


x 


Figure 3.4.5 


Tx + Sy =35 


Figure 3.4.6 


The point x = (x, y) in Equations 9 and 10 traces an entire line in R2 as the parameter ¢ varies over the interval ( = oo, 00). If, however, 
we restrict the parameter to vary from ¢ — Q to ¢ — J, then x will not trace the entire line but rather just the Jine segment joining the points 
xg and X1. The point x will start at Xj when ¢ — Q and end at X1 when ¢ — }. Accordingly, we make the following definition. 


DEFINITION 3 


If Xg and X are vectors in R”, then the equation 
x=xq +¢(x1 — xp) (0<¢<1) (13) 
defines the line segment from XQ to X,. When convenient, Equation 13 can be written as 


x= (1—£)xg + fx, (0<¢< 1) (14) 


EXAMPLE 5 ALine Segment from One Point to Another in Re < 


It follows from 13 and 14 that the line segment in 22 from xg = (1, — 3) to xy = (5, 6) can be represented either by the 


equation 


x= (1, —3) +4(4, 9) (0<£<1) 


or by 
x=(1—4)(1, —3) +405, 6) (0<¢<1) 


Dot Product Form of a Linear System 


Our next objective is to show how to express linear equations and linear systems in dot product notation. This will lead us to some 
important results about orthogonality and linear systems. 


Recall that a linear equation in the variables x1, x3, ..., Xy has the form 


Q1X1 + a9xX2 +... + ayXy, =H (@], 23, -.., Ay not all zero) (15) 


and that the corresponding homogeneous equation is 


1X1 bagxgt... + a@yX%y, = 9 (a1, 23, -.., ay not all zero) (16) 


These equations can be rewritten in vector form by letting 
a= (41,42, ....@,) and x= (x1, X2,...%p) 


in which case Formula 15 can be written as 

a:x=b (17) 
and Formula 16 as 

a:x=0 (18) 
Except for a notational change from n to a, Formula 18 is the extension to 8” of Formula 6 in Section 3.3. This equation reveals that each 


solution vector x of a homogeneous equation is orthogonal to the coefficient vector a. To take this geometric observation a step further, 
consider the homogeneous system 


441X1 + 49%. + ... + AtyxX, = DO 
Q2X1, + @79%2 + ... + G2zy4X%, = 0 
QamiX1 + ay2X2 + ... # GyyxX, = 0 


If we denote the successive row vectors of the coefficient matrix by rj, rz, ..., Fy, then we can rewrite this system in dot product form as 


rex = 0 
nee (19) 
Im'x = O 


from which we see that every solution vector x is orthogonal to every row vector of the coefficient matrix. In summary, we have the 
following result. 


THEOREM 3.4.3 


If A is an jz x » matrix, then the solution set of the homogeneous linear system 4x — @ consists of all vectors in 2” that are 
orthogonal to every row vector of A. 


EXAMPLE 6 Orthogonality of Row Vectors and Solution Vectors << 


We showed in Example 6 of Section 1.2 that the general solution of the homogeneous linear system 


x4 

13 =-2 02 O}f%2 0 

26 =—5 =—2 4 —3}/%3}  |0 

00 5 10 0 15|/%4] /0 

26 0 8 4 18]/%5 0 
x6 

is 
xy= —3r—4s—2t, xg=7r, x3 = —2s, x4=5, X5 =f, x4=—D 


which we can rewrite in vector form as 
x= (—3r—4s— 2t,r, — 2s, 5, £, 0) 


According to Theorem 3.4.3, the vector x must be orthogonal to each of the row vectors 
rj = (1,3, —2, 0, 2, 0) 
rz = (2,6, —5, —2,4, —3) 
r3 = (0, 0,5, 10, 0, 15) 
rq4= (2, 6, 0, 8, 4, 18) 


We will confirm that x is orthogonal to rj, and leave it for you to verify that x is orthogonal to the other three row vectors as 
well. The dot product of rj and x is 


ry x= 1(—3r —4s — 2t) + 37) + (= 2) ( = 25) + Ofs) + 22) + 0(0) =0 
which establishes the orthogonality. 


The Relationship Between Ax = 0 and Ax = b 


We will conclude this section by exploring the relationship between the solutions of a homogeneous linear system 4x — Q and the solutions 
(if any) of a nonhomogeneous linear system 4x — h that has the same coefficient matrix. These are called corresponding linear systems. 


To motivate the result we are seeking, let us compare the solutions of the corresponding linear systems 


*1 x1 
13 —-2 02 Off %2 0 13 =-2 02 = O}f%2 
la Be i a2 | 4 a vad I ae? A Aas il | ce a Rl 
00 5 10 0 15)|%*4 0 00 5 10 0 15)|)*4 
26 0 8 4 18] *5 0 26 0 8 4 18]]*5 6 
*6 *6 


We showed in Example 5 and Example 6 of Section 1.2 that the general solutions of these linear systems can be written in parametric form 
as 


homogeneous +x, = —3r—4s—2t, x2 =r, x3= —25, X4=8, X5=1, x6 =D 
nonhomogeneous +x, = —3r—4s—2f, x3 =r, x3= —2s, X4=5, X5=t, X§6= 3 
which we can then rewrite in vector form as 


homogeneous — (x1, %2, X3, X4,%5) = (— 3r —4s — 2t, 7, — 2s, 5, t, 0) 


3 


By splitting the vectors on the right apart and collecting terms with like parameters, we can rewrite these equations as 


nonhomogeneous — (x1, X2,%3,%4,%5) = (- 3r —4s5 = 28, 7, — 2s, 5, £, 5) 


homogeneous > (x1, X32, X3, X4,%5) =r(—3, 1, 0, 0,0) +s(—4, 0, —2, 1, 0,0) +4 — 2, 0, 0, 0, 1, 0) (20) 
nonhomogeneous —» (x1, %2, 3, %4,%5) =? — 3, 1, 0, 0,0) +5(—4, 0, —2, 1, 0,0) + 4(—2, 0, 0,0, 1,0) + (0, 0, 0, 0, 0, A (21) 


Formulas 20 and 21 reveal that each solution of the nonhomogeneous system can be obtained by adding the fixed vector (. 0, 0, 0, 0, A 


to the corresponding solution of the homogeneous system. This is a special case of the following general result. 


THEOREM 3.4.4 


The general solution of a consistent linear system 4x — h can be obtained by adding any specific solution of 4x — h to the general 
solution of 4x — 0. 


Proof Let Xg be any specific solution of 4x = h, let W denote the solution set of 4x = (, and let xg + ’ denote the set of all vectors that 
result by adding Xg to each vector in W. We must show that if x is a vector in xq ++ Mf’, then x is a solution of 4x — h, and conversely, that 
every solution of 4x — h is in the set xg + WF. 


Assume first that x is a vector in xq ++”. This implies that x is expressible in the form x = xg ++ w, where Axg = b and 4y— 0. Thus, 
Ax = A(xg + w) = Axg + Aw=b+0=b 


which shows that x is a solution of 4x = bh. 


Conversely, let x be any solution of 4x = . To show that x is in the set xp ++ }” we must show that x is expressible in the form 
x=Xxgp+w (22) 


where w is in W (i.e., Ay = 0). We can do this by taking w=x—xg. This vector obviously satisfies 22, and it is in W since 
Aw = A(x =—xg) = Ax — Axg = b—b=0 


Figure 3.4.7. The solution set of 4x — h is a translation of the solution space of 4x = 0. 


Remark Theorem 3.4.4 has a useful geometric interpretation that is illustrated in Figure 3.4.7. If, as discussed in Section 3.1, we interpret 
vector addition as translation, then the theorem states that if Xp is any specific solution of 4x — h, then the entire solution set of 4, — h can 
be obtained by translating the solution set of 4, — () by the vector xq . 


Concept Review 

¢ Parameters 

e Parametric equations of lines 

e Parametric equations of planes 

¢ Two-point vector equations of a line 

e Vector equation of a line 

¢ Vector equation of a plane 

Skills 

e Express the equations of lines in 2? and 2? using either vector or parametric equations. 

° Express the equations of planes in 2” using either vector or parametric equations. 

e Express the equation of a line containing two given points in R2 or 2? using either vector or parametric equations. 
° Find equations of a line and a line segment. 

e Verify the orthogonality of the row vectors of a linear system of equations and a solution vector. 


e Use a specific solution to the nonhomogeneous linear system 4x — h and the general solution of the corresponding linear system 
Ax —0Q to obtain the general solution to 4x = b. 


Exercise Set 3.4 
In Exercises 1-4, find vector and parametric equations of the line containing the point and parallel to the vector. 
1. Point: ( —4, 1); vector: v= (0, — 8) 

Answer: 

Vector equation: (x, y) = (—4, 1) + 4(0, —8); 


parametric equations: x = —4, y= 1—8¢ 


2. Point: (2, — 1); vector: v= (—4, —2) 
3. Point: (0, 0, 0); vector: ¥ = (= 3, 0, 1) 


Answer: 

Vector equation: (x, y,z) =£( — 3, 0, 1): 

parametric equations: x = — 3t, y=0, z=t 
4. Point: ( — 9, 3, 4); vector: v= ( — 1, 6, 0) 
In Exercises 5—8, use the given equation of a line to find a point on the line and a vector parallel to the line. 
5,x= (3-54, —6-—2£) 

Answer: 


Point: (3, — 6); parallel vector: (—5, — 1) 


6. (x,y,z) = (42, 7,4 + 32) 
7,.x=(1=£)(4, 6) +4( = 2, 0) 


Answer: 


Point: (4, 6); parallel vector: (—6, — 6) 
g.x=(1-2(0, -5,1) 


In Exercises 9-12, find vector and parametric equations of the plane containing the given point and parallel vectors. 
9. Point: ( — 3, 1, 0); vectors: vj = (0, — 3, 6) and vz = (—5, 1, 2) 
Answer: 


Vector equation: (x, y,z) = (— 3, 1,0) +4,(0, —3, 6) +é9( —5, 1, 2); 


parametric equations: x = —3— 53, y= 1—3t, +49, z= 6t1 + 2t9 
10. Point: (0, 6, — 2); vectors: vy = (0,9, —1) andvz3= (0, — 3,0) 
11. Point: ( = 1, 1, 4); vectors: vj = (6, —1, 0) andvz3= (= 1,3, 1) 


Answer: 


Vector equation: (x, y,z) =(—1, 1,4) +41(6, — 1, 0) +42(—1, 3, 1); 


parametric equations: x = — ] +4 6f; —f9, y=1—t, + 3f2, z=4 +42 
12. Point: (0, 5, —4); vectors: vy = (0,0, —5) andvz= (1, —3, = 2) 


In Exercises 13-14, find vector and parametric equations of the line in 2? that passes through the origin and is orthogonal to v. 
13. ¥= (2, 3) 
Answer: 


A possible answer is vector equation: (x, y} = £(3, 2); 


parametric equations: x = 3¢, y = 2¢ 
14,.¥= (1, 4) 
In Exercises 1516, find vector and parametric equations of the plane in 2? that passes through the origin and is orthogonal to v. 


15. v = (4, 0, —5) [Hint: Construct two nonparallel vectors orthogonal to v in R23]. 


Answer: 


A possible answer is vector equation: (x, y, z) = £1(0, 1, 0) + £9(5, 0, 4); 


parametric equations: x 4 5t2, y =£1,2=4f2 
16. v= (3, 1, — 6) 


In Exercises 17-20, find the general solution to the linear system and confirm that the row vectors of the coefficient matrix are orthogonal 
to the solution vectors. 


17. X14 x9+ x3=0 


2x1 + 2x2 + 2x3 =0 
3x1 + 3x2 + 3x3 =0 


Answer: 


Xj~= —S—t, X2=5, X3=8 
18. x1 + 3x2-—4x3=0 
2x1 + 6x2 — 8x3 =0 


19, %1 + 5x34+%34+ 2x4— x5=0 
X1— 2x2 —%3 + 3xqg+ 2x5=0 


Answer: 
35-1308 a Poe ee) = ss = 
Soa an qh X2= qr bast at, X3=", X4=8, x5s=eé 


20. x1 + 3x2 —4x3=0 
X1 + 2x2 + 3x3 =0 


21. (a) The equation x .. y z= 1 can be viewed as a linear system of one equation in three unknowns. Express a general solution of this 
equation as a particular solution plus a general solution of the associated homogeneous system. 


(b) Give a geometric interpretation of the result in part (a). 


Answer: 


(a) (1, 0,0) +s(—1, 1,0) +4¢—1, 0, 1) 
(b) a plane in p? passing through P(/, 0, 0) and parallel to ( = 1, 1, 0) and ( = 1, 0, 1) 


22. (a) The equation x .. > = ] can be viewed as a linear system of one equation in two unknowns. Express a general solution of this 
equation as a particular solution plus a general solution of the associated homogeneous system. 


(b) Give a geometric interpretation of the result in part (a). 


23. (a) Find a homogeneous linear system of two equations in three unknowns whose solution space consists of those vectors in 27 that are 
orthogonal toa= (1, 1, 1) andhb = ( —2, 3, 0). 
(b) What kind of geometric object is the solution space? 
(c) Find a general solution of the system obtained in part (a), and confirm that Theorem 3.4.3 holds. 


Answer: 
(a) * ne A ae 
—2x + 3y 


(b) a line through the origin in R3 


(OY eo apes es ga 
dx st Y si zat 


24. (a) Find a homogeneous linear system of two equations in three unknowns whose solution space consists of those vectors in 2? that are 
orthogonal toa= (— 3,2, — 1) andb= (0, —2, —2). 
(b) What kind of geometric object is the solution space? 


(c) Find a general solution of the system obtained in part (a), and confirm that Theorem 3.4.3 holds. 


25. Consider the linear systems 


3 2 =1)/"1 0 
6 4 =—2])/%2/=]0 
-3 =—2 1 {| *3 0 
and 
3 2 =—1//%1 2 
6 4 =—2]/|/*%2/=| 4 
-3 =—2 1 {| *3 =—2 


(a) Find a general solution of the homogeneous system. 
(b) Confirm that x; = 1, xz = 0, x3 = 1 is a solution of the nonhomogeneous system. 
(c) Use the results in parts (a) and (b) to find a general solution of the nonhomogeneous system. 


(d) Check your result in part (c) by solving the nonhomogeneous system directly. 


Answer: 
2 1 
a. = =o oe — = _— 
x4 ze at x2=s, x3=8£ 
2 1 
C xy =] — 254+ - Ive: 
xy=l1 35+ at. x2 5s, x3=1+¢ 
26. Consider the linear systems 
1 =-2 =3|[%1 0 
2 1 4)/*%2/=]0 
1-7 5) %*3 0 
and 
1-2 =3|[%1 2 
2 1 4)/*%2}/=] 7 
1-7) 5] *3 -1 


(a) Find a general solution of the homogeneous system. 
(b) Confirm that x; = 1, xz = 1, x3 = 1 is a solution of the nonhomogeneous system. 
(c) Use the results in parts (a) and (b) to find a general solution of the nonhomogeneous system. 


(d) Check your result in part (c) by solving the nonhomogeneous system directly. 


In Exercises 27-28, find a general solution of the system, and use that solution to find a general solution of the associated homogeneous 
system and a particular solution of the given system. 


x 
ae at 2 is 3 
6 8 2 5 x3|= 7 
9 12 3 10 x4 13 
Answer 
xy= ; _ $s — at x32 =s, x3=£, x4= 1; The general solution of the associated homogeneous system is 
xy=- 45- a x2=s, x3=£, x4=0. A particular solution of the given system is x; = 3. x2=0, x3=0, x4=1. 
x 
7879 35 6 o 4 
6 —-2 3 1 x3|> 3 
3 =1 3 14 x4 8 


True-False Exercises 


In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 


(a) The vector equation of a line can be determined from any point lying on the line and a nonzero vector parallel to the line. 
Answer: 


True 


(b) The vector equation of a plane can be determined from any point lying in the plane and a nonzero vector parallel to the plane. 
Answer: 


False 


(c) The points lying on a line through the origin in 2? or R23 are all scalar multiples of any nonzero vector on the line. 


Answer: 


True 


(d) All solution vectors of the linear system 4x — h are orthogonal to the row vectors of the matrix A if and only if h = 0. 
Answer: 


True 


(e) The general solution of the nonhomogeneous linear system 4x — h can be obtained by adding b to the general solution of the 
homogeneous linear system 4x — 0). 


Answer: 


False 


(f) If x1 and X32 are two solutions of the nonhomogeneous linear system 4x — h, then X1 — X32 is a solution of the corresponding 
homogeneous linear system. 


Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


3.5 Cross Product 


This optional section is concerned with properties of vectors in 3-space that are important to physicists and 
engineers. It can be omitted, if desired, since subsequent sections do not depend on its content. Among other 
things, we define an operation that provides a way of constructing a vector in 3-space that is perpendicular to two 
given vectors, and we give a geometric interpretation of 3 x 3 determinants. 


Cross Product of Vectors 


In Section 3.2 we defined the dot product of two vectors u and v in n-space. That operation produced a scalar as its 
result. We will now define a type of vector multiplication that produces a vector as the result but which is 


applicable only to vectors in 3-space. 


DEFINITION 1 


Ifu= (#1, #3, 43) and v= (v1, v3, v3) are vectors in 3-space, then the cross product \ 5 y is the vector 
defined by 
ux v= (4u2V3 — 43V2, 43V1 — 41V3, 412 — 4 2V1) 


axe ( ) a) 


Remark Instead of memorizing 1, you can obtain the components of y x y as follows: 


or, in determinant notation, 


u2 43 
v2 «V3 


#1 43 
Vi V3 


By 42 
Vi V2 


? > 


#1, U2 43 


v1 v3 | whose first row contains the components of u and whose second row 


* Form the 2 5 3 matrix 


contains the components of v. 


° To find the first component of y x y, delete the first column and take the determinant; to find the second 
component, delete the second column and take the negative of the determinant; and to find the third component, 
delete the third column and take the determinant. 


EXAMPLE 1 Calculating a Cross Product 
Find y x y, where u= (1, 2, = 2) andv= (3, 0, 1). 


Solution From either | or the mnemonic in the preceding remark, we have 


| 7 -| 7 1 H) 


uxyV 


0 1 ce 30 
(2, <7, —6) 


The following theorem gives some important relationships between the dot product and cross product and also 
shows that y x y is orthogonal to both u and v. 


Historical Note The cross product notation 4 x # was introduced by the American physicist and 
mathematician J. Willard Gibbs, (see p. 134) in a series of unpublished lecture notes for his students at Yale 
University. It appeared in a published work for the first time in the second edition of the book Vector 
Analysis, (Edwin Wilson) by Edwin Wilson (1879--1964), a student of Gibbs. Gibbs originally referred to 
Ax Bas the “skew product.” 


THEOREM 3.5.1 Relationships Involving Cross Product and Dot Product 


If u, v, and w are vectors in 3-space, then 
(a) u-(uxv)=0 (ux vis orthogonal fo uw) 


(6) u:(uxv)=0 (ux vis orthogonal to v) 


{c) |lux y|| = ijl? || ¥|]2 —(u- vy)? (Lagrange ' s identity) 
(2) ux(vxw) = (u-w)v—(u-v)w (relationship between cross and dot products) 


(2) (uxv) xw=(u-w)v—(v-wu (relationship between cross and dat products) 


Proof (a) Letu= (1, 22, w3) and v= (1, v2, v3). Then 


u:(uxv) = (uy, 42, ¥3) + (uqVv3 —U3V2, UV] —U1V3, ¥1V2 —U2V1) 


= 4 (UQV3 — U3V2) + u2(uqv, —u1V3) +.u3(u1V2—UQV1) =0 


Proof (b) Similar to (a). 


Proof (c) Since 


2 2 2 2 
lu x v||* = (ugv3 —u3V2)" + (ugvy —u1V3)" + (u1v2 — 4271) (2) 
and 
Fue 2 a ae. ee ee 2 
ulllhvll? — (uv)? = (uf +09 +065 | (vf + 99 +95 | — (wry + 4av2 +4303) G3) 


the proof can be completed by “multiplying out” the right sides of 2 and 3 and verifying their equality. 


Proof (d) and (e) See Exercises 38 and 39. 


EXAMPLE 2 uxvlsPerpendiculartouandtov <« 


Consider the vectors 
u=(1,2,=—2) and v= (3,0, 1) 
In Example 1 we showed that 
uxv=(2, —7, —6) 
Since 
us (uxv) = (1)(2) + (2)(—7) + (—2)(-— 6) =0 
and 
v+ (uxv) = (3)(2) + (0)(—7) + (1) (— 6) =9 


u x vy is orthogonal to both u and v, as guaranteed by Theorem 3.5.1. 


Joseph Louis Lagrange (1736-1813) 


Historical Note Joseph Louis Lagrange was a French-Italian mathematician and astronomer. Although 
his father wanted him to become a lawyer, Lagrange was attracted to mathematics and astronomy after 
reading a memoir by the astronomer Halley. At age 16 he began to study mathematics on his own and by 
age 19 was appointed to a professorship at the Royal Artillery School in Turin. The following year he 
solved some famous problems using new methods that eventually blossomed into a branch of mathematics 
called the calculus of variations. These methods and Lagrange's applications of them to problems in 
celestial mechanics were so monumental that by age 25 he was regarded by many of his contemporaries as 
the greatest living mathematician. One of Lagrange's most famous works is a memoir, Mécanique 
Analytique, in which he reduced the theory of mechanics to a few general formulas from which all other 
necessary equations could be derived. Napoleon was a great admirer of Lagrange and showered him with 
many honors. In spite of his fame, Lagrange was a shy and modest man. On his death, he was buried with 
honor in the Pantheon. 

[Image: ©SSPL/The Image Works] 


The main arithmetic properties of the cross product are listed in the next theorem. 


THEOREM 3.5.2 Properties of Cross Product 


If u, v, and w are any vectors in 3-space and k is any scalar, then: 
(a) UXxV= —(vxU) 

(b) ux (v-+w) = (uxv) + (uxw) 

(c) (atv) xw= (uxw) + (vxw) 

(d) K(ax v) = (ku) x v=ux (kv) 

(ce) ux0=O0xu=0 

(ff uxu=0 


The proofs follow immediately from Formula | and properties of determinants; for example, part (a) can be proved 
as follows. 


Proof (a) Interchanging u and v in 1 interchanges the rows of the three determinants on the right side of 1 and 
hence changes the sign of each component in the cross product. Thusux v= = (¥ xu). 


The proofs of the remaining parts are left as exercises. 
EXAMPLE 3 Standard Unit Vectors <4 


Consider the vectors 
i=(1,0,0), j=(0,1,0), k=(0,0, 1) 


These vectors each have length | and lie along the coordinate axes (Figure 3.5.1). They are called the 
standard unit vectors in 3-space. Every vector v = (v1, ¥3, V3) in 3-space is expressible in terms of 
i, j, and k since we can write 


v= (v1, v2, v3) = v1 C1, 0, 0) + v2(0, 1, 0) + ¥3(0, 0, 1) = vyi + vaj+ v3k 
For example, 
(2, = 3,4) = 2i— 3j)+ 4k 


From | we obtain 


mice f-LaE speonn 


0 1 


(0. 1, 0) 
(1, 0, 0) 


Figure 3.5.1 The standard unit vectors 


You should have no trouble obtaining the following results: 


ixi=0 jxj=0 kxk=0 
ixj=-k jxk=i kxi=j 
jxi=—-k kxj=-i ixk=-j 


Figure 3.5.2 is helpful for remembering these results. Referring to this diagram, the cross product of two 


consecutive vectors going clockwise is the next vector around, and the cross product of two consecutive vectors 
going counterclockwise is the negative of the next vector around. 


i 


Figure 3.5.2 


Determinant Form of Cross Product 


It is also worth noting that a cross product can be represented symbolically in the form 


na & #2 43), [41 3), [#1 %2 
uxV=|Zy uvQ “u3/= v2 v3l—|y, val tly, ve 
Vi V2 V3 


For example, ifu= (1, 2, — 2) andv= (3, 0, 1), then 


which agrees with the result obtained in Example 1. 


WARNING 


It is not true in general that ux (v x w) = (ux v) x w. For example, 


ix (jxj) =ix0=0 
and 
(ixj) xj=kxj= =i 
so 
ix (jxJ) #Qxj) xj 


We know from Theorem 3.5.1 that y x y is orthogonal to both u and v. If u and v are nonzero vectors, it can be 


shown that the direction of y x y can be determined using the following “right-hand rule” (Figure 3.5.3): Let 8 be 


the angle between u and v, and suppose u is rotated through the angle @ until it coincides with v. If the fingers of 
the right hand are cupped so that they point in the direction of rotation, then the thumb indicates (roughly) the 
direction of y x y. 


uxvy 


=| 


Figure 3.5.3 
You may find it instructive to practice this rule with the products 
ixj=k, jxk=i, kxi=j 
Geometric Interpretation of Cross Product 


If u and v are vectors in 3-space, then the norm of y x y has a useful geometric interpretation. Lagrange's identity, 
given in Theorem 3.5.1, states that 


2 
Ju x ¥||? = |lull2 Iv]? — (a-¥) (5) 


If @ denotes the angle between u and v, then u- ¥ = ||u||||¥||cos @ so 5 can be rewritten as 


2 a 2 2 2 2 
lux vl]? = fully]? = [lull?ll¥l|2cos29 
2 2 2 
= |lull?lIvi|? (1 — cos 0) 
2 ao 2 
= |lul|?||v||7sin29 


Since 0) < <q, it follows that sin # > 0, so this can be rewritten as 
lu x vl] = |[ull||v||sin @ (6) 


But ||v||sin @ is the altitude of the parallelogram determined by u and v (Figure 3.5.4). Thus, from 6, the area A of 
this parallelogram is given by 
A= (base) (altttude) = ||ul|||v|]sin @ = ||u x v]| 


This result is even correct if u and v are collinear, since the parallelogram determined by u and v has zero area and 
from 6 we have y x y = 0 because # — Q in this case. Thus we have the following theorem. 


THEOREM 3.5.3 Area of a Parallelogram 


If, u and v are vectors in 3-space, then ||u x ¥|| is equal to the area of the parallelogram determined by u 
and v. 


EXAMPLE 4 AreaofaTriangle <@ 
Find the area of the triangle determined by the points Pj (2, 2, 0), P2{ = 1, 0, 2), and P3(0, 4, 3). 


Solution The area A of the triangle is > the area of the parallelogram determined by the vectors 
P,P, and P, P (Figure 3.5.5). Using the method discussed in Example | of Section 3.1, 
ot —_ 
P\ P= (-3, —2, 2) and P,P; = (—2, 2, 3). It follows that 
eee —— 
Pi P2* P,P3=(=10, 5, —10) 


(verify) and consequently that 


= 1) PP) x Pi Ps|| = (15) = 22 
A= F|lP1P2 x P1P3l| = 5015) = 
2 2 2 
DEFINITION 2 
If u, v, and w are vectors in 3-space, then 
u: (vxw) 
is called the scalar triple product of u, v, and w. 
Figure 3.5.4 
P{-1, 0, 2) P0,4,3) 


P,(2, 2, 0) 


Figure 3.5.5 


The scalar triple product of u= (21, #3, #3), ¥ = (V1, V2, v3), and w= (wy, w2, w3) can be calculated from the 
formula 
Wy u2 ua 


u-(vxwi)=|¥1 V2 V3 (7) 
Wi W2 W3 


This follows from Formula 4 since 


v2 v3|, [vi vai. vi v2 
u:(vxw) =u: (e wal lw} walt wi Ww k| 
v2 V3 Vi v3 vi v2 
lwo wall |wr wel!2* hwy wef? 
Wy UZ 43 


=|, 2 V3 
Wi, W2 W3 


EXAMPLE 5 Calculating a Scalar Triple Product 


Calculate the scalar triple product u - (v x w) of the vectors 
u=31—2j-5k, v=i+4j-—4k, w=3j+ 2k 


Solution From 7, 


a: 
u‘(vxw) =|1 4 —4 
0 3 2 
4 -4 1-4 14 
= at +(=5 
aE i ‘ 2(5 2|*' 5 4 
=604+4—15=49 


Remark The symbol (u-v) x w makes no sense because we cannot form the cross product of a scalar and a 
vector. Thus, no ambiguity arises if we write y - y x w rather than u- (v x w). However, for clarity we will usually 
keep the parentheses. 


It follows from 7 that 
u:(vxw)=w: (uxv) =v: (wxu) 


since the 3 x 3 determinants that represent these products can be obtained from one another by two row 
interchanges. (Verify.) These relationships can be remembered by moving the vectors u, v, and w clockwise around 
the vertices of the triangle in Figure 3.5.6. 


w v 


Figure 3.5.6 


Geometric Interpretation of Determinants 


The next theorem provides a useful geometric interpretation of 2 x 2 and 3 x 3 determinants. 


THEOREM 3.5.4 


(a) The absolute value of the determinant 
RY 2 
det| v1 a 
is equal to the area of the parallelogram in 2-space determined by the vectors u = (21, #3) and 
v= (v4, v2). (See Figure 3.5.72.) 
(b) The absolute value of the determinant 
HY #2 3 
det] V1 V2 V3 
Wi, W2 W3 
is equal to the volume of the parallelepiped in 3-space determined by the vectors u = (21, #3, #3), 
v= (v1, v2, V3), andw= (wy, wa, wz). (See Figure 3.5.75.) 


AY Ac 


a / 
(0),03) y / 
d (uy, Hy, U3) 


(i), My) 
u x 


(a) (b) (c) 
Figure 3.5.7 


Proof (a) The key to the proof is to use Theorem 3.5.3. However, that theorem applies to vectors in 3-space, 
whereas u = (24, #3) and v = (v1, v3) are vectors in 2-space. To circumvent this “dimension problem,” we will 
view u and v as vectors in the xy-plane of an xyz-coordinate system (Figure 3.5.7c), in which case these vectors are 
expressed as u= (#1, #3, 0) and v= (v1, v2, 0). Thus 


ij 
uy 3 
uxv=|uy ug O]= 


Vi V2 V1 V2 


= del mae 


vy, v2 O 


It now follows from Theorem 3.5.3 and the fact that ||k|] = 1 that the area A of the parallelogram determined by u 
and v is 
1 3 uy 2 
del | \|k]| = es | 


Proof (b) As shown in Figure 3.5.8, take the base of the parallelepiped determined by u, v, and w to be the 
parallelogram determined by v and w. It follows from Theorem 3.5.3 that the area of the base is ||v x w]| and, as 
illustrated in Figure 3.5.8, the height / of the parallelepiped is the length of the orthogonal projection of u on ¥ x w 
. Therefore, by Formula 12 of Section 3.3, 


uy ue 
A=|luxvl| = ae I = 


which completes the proof. 


He ju: (v x Ww) 


h = ||protyscw lv x w| 
It follows that the volume V of the parallelepiped is 


V = (area of base) - height = ||v wei _ 


Ilvxw] 


u- wx) 


so from 7, 
Wy 2 3 
VY =|det} v1 v2 v3 (8) 
Wi, W2 W3 


which completes the proof. 


h = ||proj,,.y"ll 


Figure 3.5.8 


Remark If V denotes the volume of the parallelepiped determined by vectors u, v, and w, then it follows from 
Formulas 7 and 8 that 


=u: (vxw) 


(9) 


volume of parallelepiped 
| determined by u, v, and w 


From this result and the discussion immediately following Definition 3 of Section 3.2, we can conclude that 
u:‘(vxw)= +P 


where the + or — results depending on whether u makes an acute or an obtuse angle with y s< w. 


Formula 9 leads to a useful test for ascertaining whether three given vectors lie in the same plane. Since three 
vectors not in the same plane determine a parallelepiped of positive volume, it follows from 9 that 
|u -(vxw) | = 0 if and only if the vectors u, v, and w lie in the same plane. Thus we have the following result. 


THEOREM 3.5.5 


If the vectors u = (21, #3, #3), ¥= (V1, V2, V3), and w= (wy, wy, w3) have the same initial point, then 
they lie in the same plane if and only if 


HY #2 3 
ur(vxw)=|¥1 v2 v3/=0 
Wi W2 W3 


Concept Review 


Cross product of two vectors 
Determinant form of cross product 


Scalar triple product 


Skills 


Compute the cross product of two vectors u and v in 22. 


Know the geometric relationship between y x y to u and v. 

Know the properties of the cross product (listed in Theorem 3.5.2). 
Compute the scalar triple product of three vectors in 3-space. 
Know the geometric interpretation of the scalar triple product. 


Compute the areas of triangles and parallelograms determined by two vectors or three points in 2-space 
or 3-space. 


Use the scalar triple product to determine whether three given vectors in 3-space are collinear. 


Exercise Set 3.5 


In Exercises 1-2, letu= (3, 2, = 1), v= (0,2, —3), and w= (2, 6, 7}. Compute the indicated vectors. 


1. (g) VxW 


(b) Ux (vxw) 


(c) (Uxv) xw 


Answer: 


(a) (32, —6, —4) 
(b) (— 14, —20, — 82) 
(c) (27,40, —42) 


2. (a) (uxv) x (vxw) 

(b) Ux (v= 2w) 

(c) (ux v) = 2w 
In Exercises 3-6, use the cross product to find a vector that is orthogonal to both u and v. 
3,u=(—6, 4, 2), v= (3, 1, 5) 

Answer: 


(18, 36, — 18) 
4.u=(1, 1, =2),¥=(2, =1,2) 
5,u=(—2, 1,5), v= (3,0, —3) 


Answer: 


(=—3,9, =3) 
6, u= (3, 3, 1), v= (0, 4, 2) 


In Exercises 7—10, find the area of the parallelogram determined by the given vectors u and v. 
7,u=(1, —1, 2), v= (0, 3, 1) 


Answer: 


59 


g.u= (3, — 1,4), v= (6, —2, 8) 
9,u= (2, 3,0), v=(—1, 2, —2) 


Answer: 


y101 


190,u= (1, 1, 1), v= (3, 2, —5) 
In Exercises 11—12, find the area of the parallelogram with the given vertices. 
1.101, 2), P2(4, 4), P37, 5), Pal, 3) 

Answer: 


3 
12. 713, 2), P20, 4), P39, 4), Pa, 2) 


In Exercises 13-14, find the area of the triangle with the given vertices. 
13. A(2, 9), B(3, 4), CC = 1, 2) 


Answer: 


| 
14, AQ, 1), B(2, 2), C63, = 3) 


In Exercises 15—16, find the area of the triangle in 3-space that has the given vertices. 
15. ?1(2, 6, = 1), P2Q1, 1, 1), P3(4, 6, 2) 
Answer: 


374 
2 
16.P(1, — 1, 2), Q(0, 3, 4), R(6, 1, 8) 


In Exercises 17—18, find the volume of the parallelepiped with sides u, v, and w. 
17,u= (2, —6, 2), v= (0,4, = 2), w= (2, 2, = 4) 
Answer: 


16 
18, u= (3, 1,2), v= (4, 5, 1), w= (1, 2,4) 


In Exercises 19-20, determine whether u, v, and w lie in the same plane when positioned so that their initial 
points coincide. 


19,u=(—1, —2, 1), v= (3,0, —2), w= (5, —4, 0) 
Answer: 


The vectors do not lie in the same plane. 
29.u= (5, =—2, 1), v= (4, = 1, 1), w= (1, =—1,9) 


In Exercises 21—24, compute the scalar triple product u - (v x w). 
21,.u= (—2,0,6), v=, =—3,1), w= (=—5, =—1,1) 
Answer: 


—92 
22,u=(=—1,2,4), v=(3,4, =—2), w=(=1, 2,5) 


23,u= (a, 0,0), v= (0,4,0), w= (0,0, ¢) 
Answer: 


abc 


24,u= (3, = 1,6), v= (2,4,3), w= (5, =—1, 2) 
In Exercises 25—26, suppose that u (v x w) = 3. Find 
25. (a) us (Wx v) 


(b) (Vxw)-u 


(c) W* (uxv) 


26. 


27. 


28. 
29. 


30. 


31. 


32. 


33. 
34. 
35. 


36. 


Answer: 


(a) —3 

(b) 3 

(c) 3 

(a) ¥* (uxw) 
(b) (uxw)+v¥ 


(c) ¥*° Gwxw) 


(a) Find the area of the triangle having vertices A{1, 0, 1), 8(0, 2, 3), and C'(2, 1, 0). 
(b) Use the result of part (a) to find the length of the altitude from vertex C to side AB. 


Answer: 


(@) y26 
2 

(b) 26 
3 


Use the cross product to find the sine of the angle between the vectors u= (2, 3, — 6) and v = (2, 3, 6). 
Simplify (u + v) x (u=—v). 

Answer: 

2(v xu) 

Let a= (a1, 42, a3), b = (44, 52, 53), ¢ = (¢1, ¢2, 3), andd = (@1, 22, d3). Show that 


(a+d):-(bxc)=a:- (bxe) +d: (bxec) 
Let u, v, and w be nonzero vectors in 3-space with the same initial point, but such that no two of them are 
collinear. Show that 
(a) ux (¥ x w) lies in the plane determined by v and w. 
(b) (ux v) x w lies in the plane determined by u and v. 
Prove the following identities. 
(a) (ut+ikv) xv=uxv 
(b) ur (Wxz) = —(uxz)-v 
Prove: If a, b, c, and d lie in the same plane, then (ax b) x (ec x d) = 0. 
Prove: If @ is the angle between u and v and y. y + 0, then tan# = |/u x v|| / (u-v). 
Show that if u, v, and w are vectors in R2 no two of which are collinear, then u x (v x w) lies in the plane 
determined by v and w. 
It is a theorem of solid geometry that the volume of a tetrahedron is 3 (area of base) - (height). Use this result 


to prove that the volume of a tetrahedron whose sides are the vectors a, b, and ¢ is 1 a‘ (bxc) 


6 (see the 


accompanying figure). 


Figure Ex-36 


37. Use the result of Exercise 26 to find the volume of the tetrahedron with vertices P, O, R, S. 
(a) P(=1, 2,0), O(2, 1, = 3), RO, 1, 1), S(3, = 2, 3) 
(b) P(0, 0,0), OC1, 2, = 1), &(3, 4, 0), SC=— 1, = 3,4) 


38. Prove part (d) of Theorem 3.5.1. [Hint: First prove the result in the case where w=i= (1, 0, 0), then when 
w= j= (0, 1, 0), and then when w=k = (0, 0, 1). Finally, prove it for an arbitrary vector w= (wy, w3, W3) 
by writing w= wyi-+ waj+ w3k.] 

39. Prove part (e) of Theorem 3.5.1. [Hint: Apply part (a) of Theorem 3.5.2 to the result in part (d) of Theorem 
3.5.1] 


40. Prove: 
(a) Prove (b) of Theorem 3.5.2. 
(b) Prove (c) of Theorem 3.5.2. 
(c) Prove (d) of Theorem 3.5.2. 
(d) Prove (e) of Theorem 3.5.2. 
(e) Prove (f) of Theorem 3.5.2. 


True-False Exercises 


In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 
(a) The cross product of two nonzero vectors u and v is a nonzero vector if and only if u and v are not parallel. 
Answer: 


True 
(b) A normal vector to a plane can be obtained by taking the cross product of two nonzero and noncollinear vectors 
lying in the plane. 


Answer: 


True 


(c) The scalar triple product of u, v, and w determines a vector whose length is equal to the volume of the 
parallelepiped determined by u, v, and w. 


Answer: 


False 


(d) If u and v are vectors in 3-space, then ||¥ x ul] is equal to the area of the parallelogram determined by u and v. 


Answer: 


True 


(e) For all vectors u, v, and w in 3-space, the vectors (ux v) x wand ux (vx w) are the same. 
Answer: 


False 


(f) If u, v, and w are vectors in 2°, where u is nonzero and y x ¥ =u x w. then v=w. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


Chapter 3 Supplementary Exercises 


1. Letu= (= 2, 0,4), v= (3, — 1, 6), and w= (2, —5, —5). Compute 
(a) 3¥—2u 
(b) Ju + ¥ + w]| 
(c) the distance between —3y and y + Sw 
(d) Propyu 
(e) u-(vxw)} 


(f) (—S¥-+w) x ((u- v)w) 


Answer: 


(a) 3v—2u= (13, — 3, 10) 
(b) JJu+v+wl| = 70 
(c) $774 


(d) pro ju = -Fl —5, -5] 


(ce) ur (vxw) = —122 
(f) (—Sv+w) x (Cu: vw) = (— 3150, — 2430, 1170) 


Nn 


. Repeat Exercise | for the vectors u= 31 = 5j++k, y= = 21+ 2k, andw= —j+ 4k. 


. Repeat parts (a)-(d) of Exercise 1 for the vectors u= ( = 2, 6, 2, 1), v= (= 3, 0, 8, 0), and 
w= (9,1, —6, —6). 


Ge 


Answer: 
(a) 3v—2u=(—5, — 12, 20, —2) 
(b) JJu+v+wl| = y 106 


(c) ¥2810 


dy orot aes al a 
(d) projyu 5 (9,1, —6, —6) 


4. Repeat parts (a)-(d) of Exercise 1 for the vectors u= (0,5,0, —1, —2),vw= (1, =—1,6, —2, 0), and 
w= (—4, —1,4, 0, 2). 


In Exercises 5—6, determine whether the given set of vectors forms an orthogonal set. If so, normalize each 
vector to form an orthonormal set. 


5.(=— 32, = 1, 19), (3, = 1, 5), C1, 6, 2) 
Answer: 


Not an orthogonal set 


6. (= 2,0, 1), (1, 1, 2), 1, = 5, 2) 
7. (a) The set of all vectors in 22 that are orthogonal to a nonzero vector is what kind of geometric object? 
(b) The set of all vectors in 2? that are orthogonal to a nonzero vector is what kind of geometric object? 


(c) The set of all vectors in 22 that are orthogonal to two noncollinear vectors is what kind of geometric 
object? 

(d) The set of all vectors in 2 that are orthogonal to two noncollinear vectors is what kind of geometric 
object? 


Answer: 


(a) A line through the origin, perpendicular to the given vector. 
(b) A plane through the origin, perpendicular to the given vector. 
(c) {0} (the origin) 


(d) A line through the origin, perpendicular to the plane containing the two noncollinear vectors. 


- Show that ¥, = & i =) and ¥3 = a g, _ A are orthonormal vectors, and find a third vector ¥3 for 
which {v 1, v2, ¥3} is an orthonormal set. 
9. True or False: If wand v are nonzero vectors such that ||u +- v|| 2 || w/| . || ¥]| 2 then u and v are 


orthogonal. 
Answer: 


True 

10. True or False: If u is orthogonal to y 4. w, then u is orthogonal to v and w. 

11. Consider the points ?(3, — 1, 4), O(6, 0, 2), and R(5, 1, 1). Find the point S in 27 whose first 
component is —] and such that PO is parallel to py. 


_ 


Answer: 
S(=1, =1,5) 
12. Consider the points P( = 3, 1, 0, 6), O€0, 5, 1, — 2), and R( —4, 1, 4, 0). Find the point S in R4 whose 


third component is 6 and such that PO is parallel to Re. 


13. Using the points in Exercise 11, find the cosine of the angle between the vectors PO and pp. 
Answer: 
14 
1? 
14. Using the points in Exercise 12, find the cosine of the angle between the vectors PO and pp. 


15. Find the distance between the point P( — 3, 1, 3) and the plane 5x 4. z= 3y — 4. 


Answer: 


1 


y35 


16. Show that the planes 3x — y 4 §z = 7 and —6x 4 2y — 12z = 1 are parallel, and find the distance 
between the planes. 


In Exercises 17—22, find vector and parametric equations for the line or plane in question. 

17. The plane in R? that contains the points P( — 2, 1, 3), @( = 1, — 1, 1), and R(3, 0, = 2). 
Answer: 
Vector equation: (x, y,z) =(—2, 1,3) +4,(1, —2, —2)+%(5, —1, —5); 


parametric equations: x = —2 + £1) + 5f9, y= 1— 2] —f9, z= 3 — 2t; — Sta 
18. The line in 2? that contains the point P( — 1, 6, 0) and is orthogonal to the plane 4x — z= 5. 


19. The line in 2? that is parallel to the vector ¥ = (8, — 1) and contains the point P(0, — 3). 


Answer: 
Vector equation: (x, y) = (0, —3) + 2(8, — 1); 


parametric equations: x = 8f, y= —3—t 
20. The plane in 2? that contains the point P( — 2, 1, 0) and parallel to the plane —gx 4 6y —z=4-. 
21. The line in R4 with equation y = 3x — 5. 

Answer: 


A possible answer is vector equation: (x, y) = (0, —5) ++ £(1, 3); parametric equations: 
x=t, y= —-54+3 
22. The plane in 27 with equation 2x — by + 3z=5. 


In Exercises 23—25, find a point-normal equation for the given plane. 


23. The plane that is represented by the vector equation 
(x, ¥,Z) = (= 1, 5, 6) + 2,(0, — 1, 3) + £2(2, —1, 0). 


Answer: 
3(x + 1) + 6(y =5) + 2{f2@=—6) =0 

24. The plane that contains the point P{ — 5, 1, 0) and is orthogonal to the line with parametric equations 
x=3—5t, y= 2, and z = 7. 

25. The plane that passes through the points P(9, 0,4), @( = 1, 4, 3), and R(0, 6, — 2). 


Answer: 


—18(x — 9) — Sly —24(z-4) =0 


26. Suppose that {v1, ¥3, ¥3} and {wy 1, w3} are two sets of vectors such that ¥j and ¥j are orthogonal for 
all i and. Prove that if a1, @3, @3, 41, 53 are any scalars, then the vectors ¥ = ajVj + a7V2 + a3Vv3 and 
w = byw + bzw are orthogonal. 

27. Prove that if two vectors u and v in 22 are orthogonal to a nonzero vector w in 22, then u and v are scalar 
multiples of each other. 

28. Prove that ||u + ¥|| = ||u|| ++ || v|| if and only if u and v are parallel vectors. 

29. The equation Ax -+-Sy = 0 represents a line through the origin in R? if A and B are not both zero. What 
does this equation represent in 2? if you think of it as Ax -+-8y + Oz = 0? Explain. 


Answer: 


A plane 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


| CHAPTER 


a 
4 General Vector Spaces 


CHAPTER CONTENTS 


4.1. Real Vector Spaces 

4.2. Subspaces 

4.3. Linear Independence 

4.4. Coordinates and Basis 

4.5. Dimension 

4.6. Change of Basis 

4.7. Row Space, Column Space, and Null Space 

4.8. Rank, Nullity, and the Fundamental Matrix Spaces 
4.9. Matrix Transformations from 2” to 2” 

4.10. Properties of Matrix Transformations 


4.11. Geometry of Matrix Operators on 22 
4.12. Dynamical Systems and Markov Chains 


INTRODUCTION 


Recall that we began our study of vectors by viewing them as directed line segments 
(arrows). We then extended this idea by introducing rectangular coordinate systems, which 
enabled us to view vectors as ordered pairs and ordered triples of real numbers. As we 
developed properties of these vectors we noticed patterns in various formulas that enabled 
us to extend the notion of a vector to an n-tuple of real numbers. Although w-tuples took 
us outside the realm of our “visual experience,” it gave us a valuable tool for 
understanding and studying systems of linear equations. In this chapter we will extend the 
concept of a vector yet again by using the most important algebraic properties of vectors 
in R” as axioms. These axioms, if satisfied by a set of objects, will enable us to think of 
those objects as vectors. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


4.1 Real Vector Spaces 


In this section we will extend the concept of a vector by using the basic properties of vectors in R” as axioms, which if satisfied 
by a set of objects, guarantee that those objects behave like familiar vectors. 


Vector Space Axioms 


The following definition consists often axioms, eight of which are properties of vectors in 2” that were stated in Theorem 3.1.1. 
It is important to keep in mind that one does not prove axioms; rather, they are assumptions that serve as the starting point for 
proving theorems. 


Vector space scalars can be real numbers or complex 
numbers. Vector spaces with real scalars are called real 
vector spaces and those with complex scalars are called 
complex vector spaces. For now we will be concerned 
exclusively with real vector spaces. We will consider 
complex vector spaces later. 


DEFINITION 1 


Let V be an arbitrary nonempty set of obj ects on which two operations are defined: addition, and multiplication by 
scalars. By addition we mean a rule for associating with each pair of objects u and v in V an object y +. y, called the 
sum of wand vy; by scalar multiplication we mean a rule for associating with each scalar k and each object u in Van 
object ku, called the scalar multiple of u by k. If the following axioms are satisfied by all objects u, v, w in V and all 
scalars k and m, then we call V a vector space and we call the objects in V vectors. 


1. Ifuand vare objects in V, then y + y is in V. 

2, Ub¥YS=V¥+U 

3, ut(v+w)=(u+v)+w 

4. There is an object 0 in V, called a zero vector for V, such that0 + u=u+0=uforalluin/. 

5. For each u in V, there is an object —y in V, called a negative of u, such thatu+- (—u) = (—u) +u=0. 
6. Ifkis any scalar and u is any object in V, then ku is in V. 

7, klu-+v) =ku-+ kv 

3. (k++ mju=su+ mu 

9, &(u) = (km) (u) 

10. ju=u 


Observe that the definition of a vector space does not specify the nature of the vectors or the operations. Any kind of object can 
be a vector, and the operations of addition and scalar multiplication need not have any relationship to those on ”. The only 
requirement is that the ten vector space axioms be satisfied. In the examples that follow we will use four basic steps to show 
that a set with two operations is a vector space. 


To Show that a Set with Two Operations is a Vector Space 
Step 1 Identify the set V of objects that will become vectors. 


Step 2 Identify the addition and scalar multiplication operations on V. 

Step 3 Verify Axioms | and 6; that is, adding two vectors in V produces a vector in V, and multiplying a vector in V by 
a scalar also produces a vector in V. Axiom 1 is called closure under addition, and Axiom 6 is called closure under 
scalar multiplication. 


Step 4 Confirm that Axioms 2, 3, 4, 5, 7, 8, 9, and 10 hold. 


Hermann Giinther Grassmann (1809-1877) 


Historical Note The notion of an “abstract vector space” evolved over many years and had many contributors. The 
idea crystallized with the work of the German mathematician H. G. Grassmann, who published a paper in 1862 in which 
he considered abstract systems of unspecified elements on which he defined formal operations of addition and scalar 
multiplication. Grassmann's work was controversial, and others, including Augustin Cauchy (p. 137), laid reasonable 
claim to the idea. 

[Image: (c)Sueddeutsche Zeitung Photo/The Image Works] 


Our first example is the simplest of all vector spaces in that it contains only one object. Since Axiom 4 requires that every 
vector space contain a zero vector, the object will have to be that vector. 


EXAMPLE 1 The Zero VectorSpace 


Let V consist of a single object, which we denote by 0, and define 

0+0=0 and k0O=0 
for all scalars k. It is easy to check that all the vector space axioms are satisfied. We call this the zero vector 
space. 


Our second example is one of the most important of all vector spaces—the familiar space R”. It should not be surprising that 
the operations on 2” satisfy the vector space axioms because those axioms were based on known properties of operations on 2” 


EXAMPLE 2 R"lsaVectorSpace “ 


Let ” = R”, and define the vector space operations on V to be the usual operations of addition and scalar 
multiplication of n-tuples; that is, 
utv = (wy, ....U¥y) + (11, V2, -.. Vy) = ($1, U2 + V2, -.. Un FYy) 
iu = (ku, kw, -.., Kuy) 


The set 7 = R” is closed under addition and scalar multiplication because the foregoing operations produce 


n-tuples as their end result, and these operations satisfy Axioms 2, 3, 4, 5, 7, 8, 9, and 10 by virtue of Theorem 
3.1.1. 


Our next example is a generalization of 2” in which we allow vectors to have infinitely many components. 
EXAMPLE 3 The Vector Space of Infinite Sequences of RealNumbers 


Let V consist of objects of the form 
u= (1,42, ..., Uy, ---) 
in which #21, #3, ..., &y, -... 8 an infinite sequence of real numbers. We define two infinite sequences to be equal if 
their corresponding components are equal, and we define addition and scalar multiplication componentwise by 
uty = (uy, ¥2,-...&y,-..) + (V1, V2, --.%y--) 
= (#1 +1, 424+ V2, -... yn EVy, ---) 
Au = (ku, kup, .. Atty, ...) 
We leave it as an exercise to confirm that V with these operations is a vector space. We will denote this vector 
space by the symbol R™. 


In the next example our vectors will be matrices. This may be a little confusing at first because matrices are composed of rows 
and columns, which are themselves vectors (row vectors and column vectors). However, here we will not be concerned with the 
individual rows and columns but rather with the properties of the matrix operations as they relate to the matrix as a whole. 


Note that Equation | involves three different addition 
operations: the addition operation on vectors, the 
addition operation on matrices, and the addition 
operation on real numbers. 


EXAMPLE 4 A Vector Space of 2x 2 Matrices << 


Let V be the set of 2 s 2 matrices with real entries, and take the vector space operations on V to be the usual 
operations of matrix addition and scalar multiplication; that is, 


ke a ie | EB FVi, 12+ V12 
u+v= + = 


W21 422 v21 V22 WI V21 ¥22 + V22 
(1) 
Pr fo ee | ku, kuy2 
#21 422 kuz, ku 


The set Vis closed under addition and scalar multiplication because the foregoing operations produce 2 sx. 2 
matrices as the end result. Thus, it remains to confirm that Axioms 2, 3, 4, 5, 7, 8, 9, and 10 hold. Some of these 
are standard properties of matrix operations. For example, Axiom 2 follows from Theorem 1.4.1la since 


depp | 21) 812] 4g [Pet M32] _ ee 12) feat ia) 
~ | &21 4323 val va2| | vai 22 u21 422 | 
Similarly, Axioms 3, 7, 8, and 9 follow from parts (b), (A), (7), and (e), respectively, of that theorem (verify). This 
leaves Axioms 4, 5, and 10 that remain to be verified. 


To confirm that Axiom 4 is satisfied, we must find a 2 x% 2 matrix 0 in V for which y + 0 =0 + u for all 2 x 2 
matrices in V. We can do this by taking 
0 0 
0= 


With this definition, 
dane (2 oO] a [4 M2) [4 #2], 
~ 10 0}; [#21 ¥22] > | “a1 422] 
and similarly y +- 0 = u. To verify that Axiom 5 holds we must show that each object u in V has a negative —y in 
V such that u + ( =u) =0 and ( =u) + u=0. This can be done by defining the negative of u to be 


With this definition, 
Hi, &12 —U11 12 0 0 
_ = => =0 
os ie re Be ac ke | 
and similarly ( =u) ++ u=0. Finally, Axiom 10 holds because 


a1 2! #12] [4 #12] _| 
“21 422 u21 422 


EXAMPLE 5 The Vector Space of mx nMatrices 


Example 4 is a special case of a more general class of vector spaces. You should have no trouble adapting the 
argument used in that example to show that the set V of all »72 x », matrices with the usual matrix operations of 
addition and scalar multiplication is a vector space. We will denote this vector space by the symbol Jf ,,,,. Thus, 
for example, the vector space in Example 4 is denoted as Jf. 


In Example 6 the functions were defined on the entire 
interval (— co, oo }). However, the arguments used in 
that example apply as well on all subin-tervals of 
(—co, oo), such as a closed interval [a, b] or an open 
interval (a, b). We will denote the vector spaces of 
functions on these intervals by F[a, b] and F(a, 5), 
respectively. 


EXAMPLE 6 The Vector Space of Real-Valued Functions << 


Let V’be the set of real-valued functions that are defined at each x in the interval (— co, oo ). Iff = f (x) and 
g = g(x) are two functions in V and if k is any scalar, then define the operations of addition and scalar 
multiplication by 


(f+ 8) (4) =f (x) + a(x) (2) 


(AE) (x) =KF (x) (3) 


One way to think about these operations is to view the numbers f(x) and g(x) as “components” of f and g at the 
point x, in which case Equations 2 and 3 state that two functions are added by adding corresponding components, 
and a function is multiplied by a scalar by multiplying each component by that scalar—exactly as in R” and R™. 
This idea is illustrated in parts (a) and (b) of Figure 4.1.1. The set V with these operations is denoted by the 
symbol #’(— co, co ). We can prove that this is a vector space as follows: 


> 


Axioms 1 and 6 These closure axioms require that if we add two functions that are defined at each x in the 
interval (— co, co ), then sums and scalar multiples of those functions are also defined at each x in the interval 
(—0co, co). This follows from Formulas 2 and 3. 


Axiom 4 This axiom requires that there exists a function 0 in ?(— co, oo )}, which when added to any other 
function f in #(— co, oo ) produces f back again as the result. The function, whose value at every point x in the 
interval (— ca, co ) is zero, has this property. Geometrically, the graph of the function 0 is the line that 
coincides with the x-axis. 

Axiom 5 This axiom requires that for each function fin #(— co , co ) there exists a function —f in 

F(— 00, oo ), which when added to f produces the function 0. The function defined by —f (x} = — jf (x) has 
this property. The graph of —f can be obtained by reflecting the graph of f about the x-axis (Figure 4.1.1c). 
Axioms 2,3,7,8,9,10 The validity of each of these axioms follows from properties of real numbers. For example, 
if f and g are functions in *(— oo, co ), then Axiom 2 requires that f ++ g = g +f. This follows from the 
computation 


(f + g)(x) =f£(x) + g(x) = 8a) +£@) = (e+) 
in which the first and last equalities follow from 2, and the middle equality is a property of real numbers. We will 
leave the proofs of the remaining parts as exercises. 


(a) (b) (c) 


Figure 4.1.1 


It is important to recognize that you cannot impose any two operations on any set V and expect the vector space axioms to hold. 
For example, if Vis the set of n-tuples withpositive components, and if the standard operations from &” are used, then V is not 
closed under scalar multiplication, because if u is a nonzero n-tuple in V, then (—1)u has at least one negative component and 
hence is not in V. The following is a less obvious example in which only one of the ten vector space axioms fails to hold. 


EXAMPLE 7 A Set That Ils Nota Vector Space <@ 


Let 7 — p2 and define addition and scalar multiplication operations as follows: If u = (21, u2) and v = (v4, v2) 
, then define 
u+v= (uy +71, 42 +3) 
and if k is any real number, then define 
ku = (ku, 0) 

For example, if = (2, 4), v = (—3, 5), and = 7, then 

u+v= (2+ (-—3),44+5)=(-1,9) 

ku=7u= (7+ 2,0) = (14, 0) 
The addition operation is the standard one from 22, but the scalar multiplication is not. In the exercises we will 


ask you to show that the first nine vector space axioms are satisfied. However, Axiom 10 fails to hold for certain 
vectors. For example, if u = (21, #2) is such that #3 # 0, then 


lu= 1(zy,u2) = (1-41, 0) = (1,0) 4u 


Thus, V is not a vector space with the stated operations. 


Our final example will be an unusual vector space that we have included to illustrate how varied vector spaces can be. Since the 
objects in this space will be real numbers, it will be important for you to keep track of which operations are intended as vector 
operations and which ones as ordinary operations on real numbers. 


EXAMPLE 8 An Unusual VectorSpace 


Let V be the set of positive real numbers, and define the operations on V to be 
w-+-v=uyv [Vector addition is numerical multiplication. ] 
ku =u" [Scalar multiplication is numerical exponentiation. ] 
Thus, for example, ] 4+- 1 = 1 and (2)(1) = 1? = |—-strange indeed, but nevertheless the set V with these 


operations satisfies the 10 vector space axioms and hence is a vector space. We will confirm Axioms 4, 5, and 7, 
and leave the others as exercises. 


e Axiom 4—The zero vector in this space is the number | (i.e., 0 = 1) since 
u-l=w-l=wz 


e Axiom 5—The negative of a vector w is its reciprocal (i.e., —3 = | / x4) since 


“4 a4 (7]=1(=0) 


u 


* Axiom 7—k(u +) = (uv) *® = uy" = (hu) + (hv) 


Some Properties of Vectors 


The following is our first theorem about general vector spaces. As you will see, its proof is very formal with each step being 
justified by a vector space axiom or a known property of real numbers. There will not be many rigidly formal proofs of this type 
in the text, but we have included these to reinforce the idea that the familiar properties of vectors can all be derived from the 
vector space axioms. 


THEOREM 4.1.1 


Let V be a vector space, u a vector in V, and k a scalar; then: 


(a) 9u=0 
(o) O=9 
(c) (-lju= -u 


(d) If iu =O, theng—O ory—O. 


We will prove parts (a) and (c) and leave proofs of the remaining parts as exercises. 


Proof (a) We can write 


Ou+Ou=(0+0)u [Axiom 8] 
= Ou [Property of the number 0] 


By Axiom 5 the vector Ou has a negative, —Qy. Adding this negative to both sides above yields 
[Ou 4+- Ou] -+- (—Ou) = Ou + (—0u) 


or 
Ou + [Ou+ (—Ou)] =Ou+ (—Ou) [Axiom 3] 
Ou+-0=0 [Axiom 5] 
Ou =0 [Axiom 4] 

Proof (c) To prove that (—1}u= =u, we must show that u ++ (—1)u=0. The proof is as follows: 


u+ (—lju=lu+(—1l)u [Axiom 10] 
=(1+(-1))u [Axiom 8] 
= 0u [Property of numbers] 
=0 [Part (a) of this theorem] 


A Closing Observation 


This section of the text is very important to the overall plan of linear algebra in that it establishes a common thread between 
such diverse mathematical objects as geometric vectors, vectors in ®”, infinite sequences, matrices, and real-valued functions, 
to name a few. As a result, whenever we discover a new theorem about general vector spaces, we will at the same time be 
discovering a theorem about geometric vectors, vectors in ”, sequences, matrices, real-valued functions, and about any new 
kinds of vectors that we might discover. 


To illustrate this idea, consider what the rather innocent-looking result in part (a) of Theorem 4.1.1 says about the vector space 
in Example 8. Keeping in mind that the vectors in that space are positive real numbers, that scalar multiplication means 
numerical exponentiation, and that the zero vector is the number 1, the equation 


Ou =0 
is a statement of the fact that if u is a positive real number, then 
w= 1 


Concept Review 

e Vector space 

¢ Closure under addition 

¢ Closure under scalar multiplication 


e Examples of vector spaces 


Skills 
e Determine whether a given set with two operations is a vector space. 


e Show that a set with two operations is not a vector space by demonstrating that at least one of the vector space axioms 
fails. 


Exercise Set 4.1 


1. 


nN 


Let V be the set of all ordered pairs of real numbers, and consider the following addition and scalar multiplication operations 
onu= (1, #2) andv = (v1, v3): 
u+v= (uy +1, u2+ 72), Au= (0, ku) 
(a) Compute y + y and ku foru = (=—1, 2), v= (3, 4) andk=3. 
(b) In words, explain why V is closed under addition and scalar multiplication. 
(c) Since addition on V is the standard addition operation on 22, certain vector space axioms hold for V because they are 
known to hold for 22. Which axioms are they? 


(d) Show that Axioms 7, 8, and 9 hold. 


(e) Show that Axiom 10 fails and hence that V is not a vector space under the given operations. 


Answer: 


(a) UtV= (2, 6), 3u= (0, 6) 
(c) Axioms 1-5 


. Let V be the set of all ordered pairs of real numbers, and consider the following addition and scalar multiplication operations 


onu= (4, #2) and v= (v1, v3): 
u+v=(ey+vy+lug+ve+ 1), Au= (ku, ku) 
(a) Compute y + y and ku for z = (0,4), y= (1, —3), andg=2. 
(b) Show that (0, 0) #0. 
(c) Show that (—1, —1)=0. 
(d) Show that Axiom 5 holds by producing an ordered pair —y such that u ++ (—u) = 0 foru= (21, 23). 


(e) Find two vector space axioms that fail to hold. 


In Exercises 3-12, determine whether each set equipped with the given operations is a vector space. For those that are not 
vector spaces identify the vector space axioms that fail. 


im) 


nm 


a SN 


Co © 


. The set of all real numbers with the standard operations of addition and multiplication. 


Answer: 


The set is a vector space with the given operations. 


. The set of all pairs of real numbers of the form (x, 0) with the standard operations on R2. 


. The set of all pairs of real numbers of the form (x, y), where x > 0, with the standard operations on R?. 


Answer: 


Not a vector space, Axioms 5 and 6 fail. 


. The set of all -tuples of real numbers that have the form (x, x, ..., x) with the standard operations on 2”. 


. The set of all triples of real numbers with the standard vector addition but with scalar multiplication defined by 


k(x, yz) = (kx, ky, kz) 


Answer: 


Not a vector space. Axiom 8 fails. 


. The set of all 2 x 2 invertible matrices with the standard matrix addition and scalar multiplication. 


. The set of all 3 x 2 matrices of the form 


> | 


10. 


11. 


12. 


13. 
14. 
15. 
16. 
17. 


18. 


with the standard matrix addition and scalar multiplication. 
Answer: 


The set is a vector space with the given operations. 


The set of all real-valued functions f defined everywhere on the real line and such that 7 (1) = 0 with the operations used in 
Example 6. 


The set of all pairs of real numbers of the form (1, x) with the operations 
(1, ¥) 4 (Ly’) = (ly +y") and &(1, 7) = (1, ky) 
Answer: 


The set is a vector space with the given operations. 


The set of polynomials of the form @g +- @1x with the operations 
(ag + ayx) + (hp +24x) = (ag +40) + (ay +24) x 
and 
K(ag + ax) = (kag) + (kay )x 
Verify Axioms 3, 7, 8, and 9 for the vector space given in Example 4. 


Verify Axioms 1, 2, 3, 7, 8, 9, and 10 for the vector space given in Example 6. 

With the addition and scalar multiplication operations defined in Example 7, show that }7 — 2? satisfies Axioms 1-9. 
Verify Axioms 1, 2, 3, 6, 8, 9, and 10 for the vector space given in Example 8. 

Show that the set of all points in 22 lying ona line is a vector space with respect to the standard operations of vector 
addition and scalar multiplication if and only if the line passes through the origin. 

Show that the set of all points in 23 lying in a plane is a vector space with respect to the standard operations of vector 
addition and scalar multiplication if and only if the plane passes through the origin. 


In Exercises 19-21, prove that the given set with the stated operations is a vector space. 


19. 
20. 


21. 
22. 
23. 


24. 
25. 


The set / = {0} with the operations of addition and scalar multiplication given in Example 1. 


The set ™ of all infinite sequences of real numbers with the operations of addition and scalar multiplication given in 
Example 3. 


The set Af yy, Of all yz 5¢ 92 matrices with the usual operations of addition and scalar multiplication. 
Prove part (d) of Theorem 4.1.1. 


The argument that follows proves that if u, v, and w are vectors in a vector space V such that y + w—=w--w. thenyg=y 
(the cancellation law for vector addition). As illustrated, justify the steps by filling in the blanks. 


u+w=v+w Hypothesis 
(u + w) + (=—w) = (v+w) + (—w) Add—w to both sides. 
u+ [w+ (—w)] =v+ [w+ (—w)] 
u+0=v+0 
u=¥ 
Let v be any vector in a vector space V. Prove that Oy = 0. 


Below is a seven-step proof of part (b) of Theorem 4.1.1. Justify each step either by stating that it is true by hypothesis or by 
specifying which of the ten vector space axioms applies. 


Hypothesis: Let u be any vector in a vector space V, let 0 be the zero vector in V, and let k be a scalar. 


Conclusion: Then kQ = Q. 


Proof: 

(1) k0+ku=k(0+u 

(2) =ku 

(3) Since ku is in V, —ku is in V. 


(4) Therefore, (KO + ku + (—ku = ku + (-ku). 


(5) kO + (ku + (—ku)) = ku + (—ku) 
(6) k0+0=0 
(7) k0=0 


26. Let v be any vector in a vector space V. Prove that —v = (—l)v. 


27. Prove: If u is a vector in a vector space V and k a scalar such that jy — Q, then either ; — 9 or y = 0). [Suggestion: Show 
that if dy = 0 and ¢ x 0, then y —Q. The result then follows as a logical consequence of this. ] 


True-False Exercises 
In parts (a)—(e) determine whether the statement is true or false, and justify your answer. 
(a) A vector is a directed line segment (an arrow). 

Answer: 


False 


(b) A vector is an 7-tuple of real numbers. 
Answer: 


False 


(c) A vector is any element of a vector space. 
Answer: 


True 


(d) There is a vector space consisting of exactly two distinct vectors. 
Answer: 


False 


(e) The set of polynomials with degree exactly | is a vector space under the operations defined in Exercise 12. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


4.2 Subspaces 


It is possible for one vector space to be contained within another. We will explore this idea in this section, we 
will discuss how to recognize such vector spaces, and we will give a variety of examples that will be used in 
our later work. 


We will begin with some terminology. 


DEFINITION 1 


A subset W of a vector space V is called a subspace of V if W is itself a vector space under the addition 
and scalar multiplication defined on V. 


In general, to show that a nonempty set W with two operations is a vector space one must verify the ten vector 
space axioms. However, if W is a subspace of a known vector space V, then certain axioms need not be verified 
because they are “inherited” from V. For example, it is not necessary to verify that y 4 y = v + u holds in W 
because it holds for all vectors in V including those in W. On the other hand, it is necessary to verify that W is 
closed under addition and scalar multiplication since it is possible that adding two vectors in W or multiplying a 
vector in W by a scalar produces a vector in V that is outside of W (Figure 4.2.1). 


Figure 4.2.1 The vectors u and v are in W, but the vectors y +. y and ku are not 


Those axioms that are not inherited by W are 

Axiom 1—Closure of W under addition 

Axiom 4—Existence of a zero vector in W 

Axiom 5—Existence of a negative in W for every vector in W 
Axiom 6—Closure of W under scalar multiplication 


so these must be verified to prove that it is a subspace of V. However, the following theorem shows that if 
Axiom | and Axiom 6 hold in W, then Axioms 4 and 5 hold in W as a consequence and hence need not be 
verified. 


THEOREM 4.2.1 


If W is a set of one or more vectors in a vector space V, then W is a subspace of V if and only if the 
following conditions hold. 
(a) If wand v are vectors in W, then y + y is in W. 


(b) If k is any scalar and wu is any vector in W, then ku is in W. 


In words, Theorem 4.2.1 states that Wis a 
subspace of Vif and only if it is closed under 
addition and scalar multiplication. 


Proof If Wisa subspace of JV, then all the vector space axioms hold in W, including Axioms 1| and 6, which 
are precisely conditions (a) and (5). 


Conversely, assume that conditions (a) and (6) hold. Since these are Axioms 1 and 6, and since Axioms 2, 3, 7, 
8, 9, and 10 are inherited from V, we only need to show that Axioms 4 and 5 hold in W. For this purpose, let u 
be any vector in W. It follows from condition (6) that ku is a vector in W for every scalar k. In particular, 

Ou = 0 and (=—lju= — ware in W, which shows that Axioms 4 and 5 hold in W. 


Note that every vector space has at least two 
subspaces, itself and its zero subspace. 


EXAMPLE 1 The Zero Subspace << 


If V is any vector space, and if #7 = {0} is the subset of V that consists of the zero vector only, 
then W is closed under addition and scalar multiplication since 


0+0=0 andsO=0 
for any scalar k. We call W the zero subspace of V. 


EXAMPLE 2 Lines Through the Origin Are Subspaces of R* andof R® 


If W is a line through the origin of either 22 or R3, then adding two vectors on the line W or multiplying : 


on the line W by a scalar produces another vector on the line W, so W is closed under addition and scalar 
multiplication (see Figure 4.2.2 for an illustration in 23). 


(a) W is closed under addition. (b) W is closed under scalar 
multiplication. 


Figure 4.2.2 


EXAMPLE 3 Planes Through the Origin AreSubspaces of R? < 


If u and v are vectors in a plane W through the origin of 23, then it is evident geometrically that y by 


and ku lie in the same plane W for any scalar k (Figure 4.2.3). Thus W is closed under addition and 
scalar multiplication. 


Figure 4.2.3 The vectors y +. y and ku both lie in the same plane as u and v 


Table 1 that follows gives a list of subspaces of 22 and of 2? that we have encountered thus far. We will see 
later that these are the only subspaces of 2 and of 2. 


Table 1 
Subspaces of R* Subspaces of R3 
* {0} * {0} 
¢ Lines through the origin ¢ Lines through the origin 
° Rp? ¢ Planes through the origin 
e R? 


EXAMPLE 4 A Subset of R* ThatIs Nota Subspace “@ 


Let W be the set of all points (x, y) in R2 for which x > 0 and y > 0 (the shaded region in Figure 
4.2.4). This set is not a subspace of 22 because it is not closed under scalar multiplication. For 
example, v = (1, 1) isa vector in W, but (—l)w = (—1, — 1) is not. 


Figure 4.2.4 Wis not closed under scalar multiplication 


EXAMPLE 5 SubspacesofMnn <4 


We know from Theorem 1.7.2 that the sum of two symmetric » x 4, matrices is symmetric and 
that a scalar multiple of a symmetric » 5 y matrix is symmetric. Thus, the set of symmetric » x » 
matrices is closed under addition and scalar multiplication and hence is a subspace of AM yy. 
Similarly, the sets of upper triangular matrices, lower triangular matrices, and diagonal matrices 
are subspaces of Af). 


EXAMPLE 6 A Subset of Mnn Thatls Nota Subspace << 


The set W of invertible » » » matrices is not a subspace of Af,,,,, failing on two counts—t is not 
closed under addition and not closed under scalar multiplication. We will illustrate this with an 
example in Jf that you can readily adapt to A¥,,,,. Consider the matrices 


Liz -1 2 
= and Y= 
2 5| =! 7=|72 5| 
The matrix OU is the 2 % 2 zero matrix and hence is not invertible, and the matrix 77 + }” has a 
column of zeros, so it also is not invertible. 


CALCULUS REQUIRED 


EXAMPLE 7 The Subspace C(-~,~) <@ 


There is a theorem in calculus which states that a sum of continuous functions is continuous and 
that a constant times a continuous function is continuous. Rephrased in vector language, the set 
of continuous functions on (= co, oo ) is a subspace of #(— ca, co ). We will denote this 


subspace by C{— co, oo). 


CALCULUS REQUIRED 


EXAMPLE 8 Functions with Continuous Derivatives 


A function with a continuous derivative is said to be continuously differentiable. There is a 
theorem in calculus which states that the sum of two continuously differentiable functions is 
continuously differentiable and that a constant times a continuously differentiable function is 
continuously differentiable. Thus, the functions that are continuously differentiable on 

(—co, co ) forma subspace of ? (= co, co ). We will denote this subspace by 

oh (—.o0, oo }, where the superscript emphasizes that the first derivative is continuous. To take 
this a step further, the set of functions with m continuous derivatives on (= 00, co) isa 
subspace of #'(— co, oo ) as is the set of functions with derivatives of all orders on 

(— co, 00 ). We will denote these subspaces by C""(— 00, co ) and ™(— 00, 00), 


respectively. 


EXAMPLE 9 The Subspace of All Polynomials 


Recall that a polynomial is a function that can be expressed in the form 
p(x) =ag+ayxe + + + ayx” (1) 


where @g, @1, * * *,@, are constants. It is evident that the sum of two polynomials is a 
polynomial and that a constant times a polynomial is a polynomial. Thus, the set W of all 
polynomials is closed under addition and scalar multiplication and hence is a subspace of 
F (=o, oo ). We will denote this space by P... 


EXAMPLE 10 The Subspace of Polynomials of Degreesn 


Recall that the degree of a polynomial is the highest power of the variable that occurs with a 
nonzero coefficient. Thus, for example, if @,, # 0 in Formula 1, then that polynomial has degree n. 
It is not true that the set W of polynomials with positive degree n is a subspace of F(— co, co ) 
because that set is not closed under addition. For example, the polynomials 


1 2x + 3x? and 5+ 7x = 3x? 


both have degree 2, but their sum has degree 1. What is true, however, is that for each nonnegative 
integer n the polynomials of degree n or Jess form a subspace of # (= ca, co }. We will denote 
this space by P),. 


In this text we regard all constants to be 
polynomials of degree zero. Be aware, however, 
that some authors do not assign a degree to the 
constant 0. 


The Hierarchy of Function Spaces 


It is proved in calculus that polynomials are continuous functions and have continuous derivatives of all orders 
on (—co, oo ). Thus, it follows that P_.. is not only a subspace of #{— 00 , 0 }, as previously observed, but 
is also a subspace of C'™(— 00, 00). We leave it for you to convince yourself that the vector spaces 
discussed in Example 7 to Example 10 are “nested” one inside the other as illustrated in Figure 4.2.5. 


Figure 4.2.5 


Remark In our previous examples, and as illustrated in Figure 4.2.5, we have only considered functions that 
are defined at all points of the interval (— co , co }. Sometimes we will want to consider functions that are 
only defined on some subinterval of (— co, 00 ), say the closed interval [a, b] or the open interval (a, b). In 
such cases we will make an appropriate notation change. For example, C[a, b] is the space of continuous 
functions on [a, b] and C(a, b) is the space of continuous functions on (a, 5). 


Building Subspaces 


The following theorem provides a useful way of creating a new subspace from known subspaces. 


THEOREM 4.2.2 


If #1, 3, .... W, are subspaces of a vector space JV, then the intersection of these subspaces is also a 
subspace of V. 


Note that the first step in proving Theorem 4.2.2 
was to establish that W contained at least one 
vector. This is important, for otherwise the 
subsequent argument might be logically correct 
but meaningless. 


Proof Let Wbe the intersection of the subspaces }¥;, 3, ..., #’,. This set is not empty because each of these 
subspaces contains the zero vector of V, and hence so does their intersection. Thus, it remains to show that W is 
closed under addition and scalar multiplication. 


To prove closure under addition, let u and v be vectors in W. Since W is the intersection of 1”, WW’, ..., A>, it 
follows that u and v also lie in each of these subspaces. Since these subspaces are all closed under addition, 
they all contain the vector y +. y and hence so does their intersection W. This proves that W is closed under 
addition. We leave the proof that Wis closed under scalar multiplication to you. 


Sometimes we will want to find the “smallest” subspace of a vector space V that contains all of the vectors in 
some set of interest. The following definition, which generalizes Definition 4 of Section 3.1, will help us to do 
that. 


If * — 1, then Equation 2 has the form 
w = &1;vj1, in which case the linear combination 
is just a scalar multiple of ¥1. 


DEFINITION 2 


If w is a vector in a vector space V, then w is said to be a linear combination of the vectors 
V1, V3, -.., Vy in Vif w can be expressed in the form 


w=kiv,) +k2v2+ °° ° +4,v, (2) 


where kj, 3, ..., ky are scalars. These scalars are called the coefficients of the linear combination. 


THEOREM 4.2.3 


If S= {wy , w9, .... Wy} is a nonempty set of vectors in a vector space J, then: 
(a) The set W of all possible linear combinations of the vectors in S is a subspace of V. 


(b) The set W in part (a) is the “smallest” subspace of V that contains all of the vectors in S in the sense 
that any other subspace that contains those vectors contains W. 


Proof (a) Let Wbe the set of all possible linear combinations of the vectors in S. We must show that S is 
closed under addition and scalar multiplication. To prove closure under addition, let 


u=cywy +cgw2+ °° + +p, andvw=Aywy + dpw2 + + + + + Apwy, 
be two vectors in S. It follows that their sum can be written as 
ut v= (cy +4) )w + (cp +42)Wo+ °° + (cp +4,)w, 


which is a linear combination of the vectors in S. Thus, W is closed under addition. We leave it for you to prove 
that W is also closed under scalar multiplication and hence is a subspace of V. 


Proof (b) Let W' be any subspace of V that contains all of the vectors in S. Since W’ is closed under addition 
and scalar multiplication, it contains all linear combinations of the vectors in S and hence contains W. 


The following definition gives some important notation and terminology related to Theorem 4.2.3. 


DEFINITION 3 


The subspace of a vector space V that is formed from all possible linear combinations of the vectors in 
a nonempty set S is called the span of S, and we say that the vectors in S span that subspace. If 
S= {wyj, Ww, -.., Wy}, then we denote the span of S by 


span {W1,W2,...Wy} or span(s) 


EXAMPLE 11 The Standard Unit Vectors Span R” 


Recall that the standard unit vectors in R” are 
e; = (1,0, 0,...,0), eg = (0, 1, 0,..., 0),...,, 2, = (0, 0, 0, ...1) 
These vectors span 8” since every vector v = (v1, V2, -.., Vy) in R” can be expressed as 
vV=vyey $vzeg2 + * tt HF Vypey 
which is a linear combination of e;, e3, ..., @,. Thus, for example, the vectors 
i= (1,0, 0), j= (0, 1,0), k= (0, 0, 1) 
span 2? since every vector ¥ = (a, b, c) in this space can be expressed as 


v= (a, b,c) =a(1, 0, 0) + 200, 1,0) +¢(0, 0, 1) =ai+ dj+ck. 


EXAMPLE 12 A Geometric View of Spanning in R* andR® 


(a) If vis a nonzero vector in 2 or 2? that has its initial point at the origin, then span{v}, which 
is the set of all scalar multiples of v, is the line through the origin determined by v. You should 
be able to visualize this from Figure 4.2.6a by observing that the tip of the vector Av can be 
made to fall at any point on the line by choosing the value of & appropriately. 


George William Hill (1838-1914) 


Historical Note The terms linearly independent and linearly dependent were 
introduced by Maxime Bocher (see p. 7) in his book Introduction to Higher Algebra, 
published in 1907. The term /inear combination is due to the American mathematician 
G. W. Hill, who introduced it in a research paper on planetary motion published in 
1900. Hill was a “loner” who preferred to work out of his home in West Nyack, New 
York, rather than in academia, though he did try lecturing at Columbia University for a 
few years. Interestingly, he apparently returned the teaching salary, indicating that he 
did not need the money and did not want to be bothered looking after it. Although 
technically a mathematician, Hill had little interest in modern developments of 
mathematics and worked almost entirely on the theory of planetary orbits. 

[Image: Courtesy of the American Mathematical Society] 


(b) If ¥1 and ¥2 are nonzero vectors in 2? that have their initial points at the origin, then 
span {v 1, ¥2}, which consists of all linear combinations of ¥1 and ¥3, is the plane through the 
origin determined by these two vectors. You should be able to visualize this from Figure 4.2.65 
by observing that the tip of the vector k;w, +- £33 can be made to fall at any point in the 
plane by adjusting the scalars &; and £3 to lengthen, shorten, or reverse the directions of the 
vectors kw, and £3v3 appropriately. 


span (¥), ¥5} 


kV) + kv, 


(a) Span {v} is the line through the (b) Span {v,, ¥} is the plane through the 
origin determined by v. origin determined by v, and v,. 


Figure 4.2.6 


EXAMPLE 13 ASpanning SetforPnh << 


The polynomials 1, x, x?, ..., x" span the vector space P,, defined in Example 10 since each 
polynomial p in P,, can be written as 

p=agtayx+- ++ +ayx" 
2 


> 


Py =span{ x, x7, . ‘on 


which is a linear combination of 1, x, x - + x”. We can denote this by writing 


The next two examples are concerned with two important types of problems: 
° Given a set S of vectors in R” and a vector v in 2”, determine whether v is a linear combination of the 
vectors in S. 


° Given a set S of vectors in 2”, determine whether the vectors span 2”. 


EXAMPLE 14 Linear Combinations << 


Consider the vectors u= (1, 2, — 1) and v= (6, 4, 2) in R3. Show that w= (9, 2,7) isa 
linear combination of u and v and that w’ = (4, — 1, 8) is nota linear combination of u and v. 


Solution In order for w to be a linear combination of u and v, there must be scalars £ and k3 
such that w= £,u ++ &2v; that is, 
(9, 2,7) =k, (1, 2, — 1) + &9(6, 4, 2) 
or 
(9,2, 7) = (ky + 649, 2ky + 4ko, — ky + 2k3) 


Equating corresponding components gives 


ky +6k2 = 9 
2ki+4k2 = 2 
=—kj+2k3 = 7 
Solving this system using Gaussian elimination yields kj = — 3, kz = 2, so 
w= —3u+2v 


Similarly, for w’ to be a linear combination of u and v, there must be scalars £; and £3 such that 
w = kyu + kv; that is, 

(4, =1,8) =4,(1, 2, =—2) + £366, 4, 2) 
or 


(4, —1, 8) = (ky + 6k, 2ky + 4k, — ky + 2k) 


Equating corresponding components gives 


kij+6k2 = 4 
2ki+4k2 = <1 
—<kj}+2k; = 8 


This system of equations is inconsistent (verify), so no such scalars &; and £3 exist. 
Consequently, w’ is not a linear combination of u and v. 


EXAMPLE 15 Testing for Spanning 
Determine whether vj = (1, 1, 2), vz = (1, 0, 1), and v3 = (2, 1, 3) span the vector space R?. 


Solution We must determine whether an arbitrary vector b = (1, 53, 3) in R3 can be 
expressed as a linear combination 
b=k,v1 + &ov2 + k3v3 
of the vectors ¥1, ¥2, and ¥3. Expressing this equation in terms of components gives 
(24, 69, 63) =41,C1, 1, 2) +4201, 0, 1) + 4302, 1, 3) 
or 
(21, 22, 3) = (ky +42 + 2k3, ky +3, 2k + 2 + 3k3) 
or 
ki +ko+2k3 = 2d, 
ky + kg bg 
2k, + kz + 3k3 b3 
Thus, our problem reduces to ascertaining whether this system is consistent for all values of 5, 


6, and 3. One way of doing this is to use parts (e) and (g) of Theorem 2.3.8, which state that 
the system is consistent if and only if its coefficient matrix 


112 
A=|1 0 1 
243 


has a nonzero determinant. But this is not the case here; we leave it for you to confirm that 
det(.A) = 0, so V1, ¥2, and ¥3 do not span p23. 


Solution Spaces of Homogeneous Systems 


The solutions of a homogeneous linear system 4x — Q of m equations in n unknowns can be viewed as vectors 
in R”. The following theorem provides a useful insight into the geometric structure of the solution set. 


THEOREM 4.2.4 


The solution set of a homogeneous linear system 4x — Q in n unknowns is a sub space of 2”. 


Proof Let Wbe the solution set for the system. The set W is not empty because it contains at least the trivial 
solution x = Q. 


To show that W is a subspace of 2”, we must show that it is closed under addition and scalar multiplication. To 
do this, let Xj and X2 be vectors in W. Since these vectors are solutions of 4x —Q, we have 
Ax; =0 and Ax; =0 
It follows from these equations and the distributive property of matrix multiplication that 
A(x, + x2) = Ax; + Ax2=0+0=0 
so W is closed under addition. Similarly, if k is any scalar then 
Atkx,) = kAx; = k0=0 

so Wis also closed under scalar multiplication. 

Because the solution set of a homogeneous 

system in 7 unknowns is actually a subspace of 


R”, we will generally refer to it as the solution 
space of the system. 


EXAMPLE 16 Solution Spaces of Homogeneous Systems 


Consider the linear systems 


(a) |1 —2 3)[x 
— P| 


0 
0 
3 =—6 9IL2 0 
(b) 1 =—2 3\fx 0 
—3 7F7 =—8 > |- 0 
—2 4 =—6|| 2 0 
(c) 1 =—2 3\fx 0 
3 7 <8 > |- 
4 1 2\L2 0 
(d)|9 0 Olfx 0 
00 0 > |- 0 
00 O}L4 0 
Solution 


(a) We leave it for you to verify that the solutions are 
,=26=—3, yoo z= 


from which it follows that 


x= 2y —3z0rx—2y + 3z=0 

This is the equation of a plane through the origin that hasn = (1, — 2, 3) as anormal. 

(b) We leave it for you to verify that the solutions are 
x= =—5f, y= =f, z=8 

which are parametric equations for the line through the origin that is parallel to the vector 

v=(=—5, =—1, 1). 
(c) We leave it for you to verify that the only solution is x =0, y =0, z=0, so the solution 

space is {0}. 


(d) This linear system is satisfied by all real values of x, y, and z, so the solution space is all of 27 


Remark Whereas the solution set of every homogeneous system of m equations in m unknowns is a subspace 
of 2”, it is never true that the solution set of a nonhomogeneous system of m equations in n unknowns is a 
subspace of 8”. There are two possible scenarios: first, the system may not have any solutions at all, and 
second, if there are solutions, then the solution set will not be closed under either addition or under scalar 
multiplication (Exercise 18). 


A Concluding Observation 


It is important to recognize that spanning sets are not unique. For example, any nonzero vector on the line in 
Figure 4.2.6a will span that line, and any two noncollinear vectors in the plane in Figure 4.2.65 will span that 
plane. The following theorem, whose proof we leave as an exercise, states conditions under which two sets of 
vectors will span the same space. 


THEOREM 4.2.5 


IfS= {vy, v2,.... vy} and S’ = {wy, w2, ..., w,} are nonempty sets of vectors in a vector space V, 
then 


span {v1,V2,-.. Vy} =span (wy, W9, .... Wi} 


if and only if each vector in Sis a linear combination of those in S’, and each vector in S" is a linear 
combination of those in S. 


Concept Review 


e Subspace 


e Zero subspace 

e Examples of subspaces 

e Linear combination 

e Span 

e Solution space 

Skills 

e Determine whether a subset of a vector space is a subspace. 
e Show that a subset of a vector space is a subspace. 


e Show that a nonempty subset of a vector space is not a subspace by demonstrating that the set is 
either not closed under addition or not closed under scalar multiplication. 


e Given a set S of vectors in ®” and a vector v in 2”, determine whether v is a linear combination of 
the vectors in S. 


° Given a set S of vectors in 2”, determine whether the vectors in S span 2”. 


e Determine whether two nonempty sets of vectors in a vector space V span the same subspace of V. 


Exercise Set 4.2 


1. Use Theorem 4.2.1 to determine which of the following are subspaces of 22. 
(a) All vectors of the form (a, 0, 0). 
(b) All vectors of the form (a, 1, 1). 
(c) All vectors of the form (a, 5, c), where b= @ +c. 
(d) All vectors of the form (a, b, c), whereb=a+e+ 1. 
(ce) All vectors of the form (a, 5, 0). 


Answer: 


(a), (c), (€) 
2. Use Theorem 4.2.1 to determine which of the following are subspaces of My). 
(a) The set of all diagonal » sx »; matrices. 
(b) The set of all » 5¢ , matrices A such that det(_A}) = 0. 
(c) The set of all »z 5 , matrices A such that tr(A) = 0. 
(d) The set of all symmetric 92 5 », matrices. 
(e) The set of all » sx 3 matrices A such that 47 — — 4. 
(f) The set of all »2 x », matrices A for which 4x — 0 has only the trivial solution. 


(g) The set of all »2 x », matrices A such that 48 — 34 for some fixed » x » matrix B. 


3. Use Theorem 4.2.1 to determine which of the following are subspaces of P3. 


(a) All polynomials gp 4 ayx 4 anx? } agx? for which ag = 0. 


(b) All polynomials gq + a,x 4 agx? | aax? for which ag + aj + a3 +a3=0. 
(c) All polynomials of the form gp + @1x 4 anx? n aax? in which &g, 1, @2, and @3 are integers. 


(d) All polynomials of the form @g + @1x, where @g and @1 are real numbers. 


Answer: 


(a), (b), (d) 
4. Which of the following are subspaces of ?(— co, co }? 
(a) All functions fin ?(— oo, oo } for which 7 (0) = 0. 
(b) All functions fin *{— 00, oo ) for which # (0) = 1. 
(c) All functions fin #{— oo, co ) for which f (=x) = f (x). 
(d) All polynomials of degree 2. 
5. Which of the following are subspaces of R™? 
(a) All sequences v in R™ of the form v = (v, 0, v, 0, v, 0, -..). 
(b) All sequences v in R™ of the form v = (v, 1, v, 1, v, 1,-..). 
(c) All sequences v in R™ of the form v = (v, 2v, 4v, By, l6v,...) . 


d) All sequences in 2™ whose components are 0 from some point on. 
q p p 


Answer: 


(a), (c), (d) 

6. A line L through the origin in 27 can be represented by parametric equations of the form x = gf, y =bt, 
and z — cf. Use these equations to show that L is a subspace of 27 by showing that if vy = (x1,.1,21) and 
¥2 = (x3, ¥2, Z2) are points on L and kis any real number, then k¥1 and v1 ++ ¥2 are also points on L. 

7. Which of the following are linear combinations of u= (0, — 2, 2) andv= (1,3, — 1)? 

(a) (2,2,2) 
(b) G,1,5) 
(c) (0, 4, 5) 
(d) (0, 0, 0) 


Answer: 


(a), (b), (d) 
8. Express the following as linear combinations of u= (2, 1,4), v= (1, — 1, 3), and w= (3, 2, 5). 
@) (=3..=7, $15) 
(b) (6,11,6) 
(c) (0,0,0) 
(d) (7,8,9) 


9. Which of the following are linear combinations of 


“43 
(b) 
(c) 


(d) f—1 5 
71 


Answer: 


(a), (b), (c) 
10. In each part express the vector as a linear combination of p; = 2 +- x + 4x, p2=1—x+ 3x4, and 
p3=3+ 2x + 5x?. 
(a) —9 —7x — 15x? 
(b) 6+ 11x + 6x7 
(c) 0 
(d) 7+ 8x + 9x7 
11. In each part, determine whether the given vectors span 22. 
(a) vy = (2, 2, 2), z= (0, 0, 3), ¥g = (0, 1, 1) 
(b) vy = (2, = 1, 3), vg = (4, 1, 2), vg = (8, = 1, 8) 
(c) vy = (3, 1,4), v3 = (2, — 3,5), vg = (5, — 2, 9), vg= (1,4, — 1) 
(d) vy = (1, 2, 6), v2 = (3, 4, 1), v3 = (4, 3, 1), vq = (3, 3, 1) 


Answer: 


(a) The vectors span 
(b) The vectors do not span 
(c) The vectors do not span 
(d) The vectors span 
12. Suppose that vy = (2, 1, 0, 3), v2 = (3, — 1, 5, 2), and v3 = ( — 1, 0, 2, 1). Which of the following 
vectors are in span {¥1, ¥2, ¥3} ? 
(a) (2,3, —7, 3) 
(b) (0, 0, 0, 0) 


(c) 01,1, 1) 
(dy t= 4,8, = 13,4) 


13. Determine whether the following polynomials span 3. 


14 


15. 


pp=1l—x+ 2x2, p2=3+3%, 
p3=5-—x4 Ax?, P4a= —2-—2x + 2x? 


Answer: 


The polynomials do not span 


. Let f = cos*x and g= sin°x. Which of the following lie in the space spanned by f and g? 


(a) cos 2x 

(b) 34x? 

(c) 1 

(d) sin x 

(e) 0 

Determine whether the solution space of the system 4x — Q is a line through the origin, a plane through the 


origin, or the origin only. If it is a plane, find an equation for it. If it is a line, find parametric equations for 
it. 


(a) —1 1 1 
A= —1 0 
2 —4 —5 
(b) 1 =2 3 
A=|=—-3 6 9 
—2 4 6 
(c) La 
A=|2.5 3 
10 8 
(d) 1 2 =—6 
A=/]1 4 4 
3 10 6 
(e) 1-1 1 
A=/|2 —1 4 
3 ss GI 
(f) 1-3 1 
A=|2 —6 2 
3 =9 3 
Answer: 
: 1 3 
a) L 7 =_— =_— = — = ;— 
(a) ine; x nos ¥ he z= 
(b) Line; x = 2¢, yi, z=0 
(c) Origin 


(d) Origin 


(ec) Line;x = —3¢, y= —2t, z=t 
(f) Plane; x —3y4+z=0 
16. (Calculus required) Show that the following sets of functions are subspaces of # { = co, oo). 
(a) All continuous functions on { = 00, oo). 
(b) All differentiable functions on ( — 0, 00). 
(c) All differentiable functions on ( — 00, 00) that satisfy f' +- 2f = 0. 


17. (Calculus required) Show that the set of continuous functions f = f (x) on [a, b] such that 
Qa 


18. Show that the solution vectors of a consistent nonhomoge- neous system of m linear equations in n 
unknowns do not form a subspace of 2”. 


19. Prove Theorem 4.2.5. 


20. Use Theorem 4.2.5 to show that the vectors vj = (1, 6, 4), vz = (2,4, — 1), v3 = (— 1, 2, 5), and the 
vectors w, = (1, —2, —5), wz = (0, 8, 9) span the same subspace of p23. 


dx =0 


is a subspace of C[a, 5]. 


True-False Exercises 
In parts (a)—(k) determine whether the statement is true or false, and justify your answer. 
(a) Every subspace of a vector space is itself a vector space. 

Answer: 


True 


(b) Every vector space is a subspace of itself. 
Answer: 


True 


(c) Every subset of a vector space V that contains the zero vector in V is a subspace of V. 
Answer: 


False 
(d) The set 22 is a subspace of p2. 


Answer: 


False 


(e) The solution set of a consistent linear system 4x — h of m equations in n unknowns is a subspace of 2”. 


Answer: 


False 


(f) The span of any finite set of vectors in a vector space is closed under addition and scalar multiplication. 
Answer: 


True 


(g) The intersection of any two subspaces of a vector space V is a subspace of V. 
Answer: 


True 


(h) The union of any two subspaces of a vector space V is a subspace of V. 
Answer: 


False 


(i) Two subsets of a vector space V that span the same subspace of V must be equal. 
Answer: 


False 


(j) The set of upper triangular » x % matrices is a subspace of the vector space of all » s >, matrices. 
Answer: 
True 

(kK) The polynomials x — ], (x = i and (x = i) span P3. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


4.3 Linear Independence 


In this section we will consider the question of whether the vectors in a given set are interrelated in the sense 
that one or more of them can be expressed as a linear combination of the others. This is important to know in 
applications because the existence of such relationships often signals that some kind of complication is likely 
to occur. 


Extraneous Vectors 


In a rectangular xy-coordinate system every vector in the plane can be expressed in exactly one way as a 
linear combination of the standard unit vectors. For example, the only way to express the vector (3, 2) as a 
linear combination of i= (1, 0) and j= (0, 1) is 


(3, 2) = 3(1, 0) + 2(0, 1) = 3i+ 2j (1) 


(Figure 4.3.1). Suppose, however, that we were to introduce a third coordinate axis that makes an angle of 45° 
with the x-axis. Call it the w-axis. As illustrated in Figure 4.3.2, the unit vector along the w-axis is 


1 1 
= |—=, —=— 
| y2" ¥2 | 
Whereas Formula | shows the only way to express the vector (3, 2) as a linear combination of i and j, there 
are infinitely many ways to express this vector as a linear combination of i, j, and w. Three possibilities are 


b 2}=3(.0 0.1 ot t}=3 + 2+ Ow 
b 2}=2(1.0 | (. ) } tt 4-3 i+ fw 
b 2}=a(1.0 so t) ale t}=4 + 3j— ow 


In short, by introducing a superfluous axis we created the complication of having multiple ways of assigning 
coordinates to points in the plane. What makes the vector w superfluous is the fact that it can be expressed as 
a linear combination of the vectors i and j, namely, 


1 1 1. Ls 
v= |—=, — |= 1+ I) 
te (2 | y2 y2 
Thus, one of our main tasks in this section will be to develop ways of ascertaining whether one vector in a set 
Sis a linear combination of other vectors in S. 


Figure 4.3.2 


Linear Independence and Dependence 


We will often apply the terms /inearly 
independent and linearly dependent to the 
vectors themselves rather than to the set. 


DEFINITION 1 


If S= {¥1, v2, ..., ¥y} is a nonempty set of vectors in a vector space V, then the vector equation 
kyv, +kov2 +... +4,¥, =0 

has at least one solution, namely, 
ky=0, &9=0,.., &=0 


We call this the trivial solution. If this is the only solution, then S is said to be a linearly independent 
set. If there are solutions in addition to the trivial solution, then S is said to be a linearly dependent 
Set. 


EXAMPLE 1 Linear Independence of the Standard Unit Vectors inR” 


The most basic linearly independent set in 2” is the set of standard unit vectors 
ey = (1, 0,0,..,0), eg=(0,1,0,..,0),.... a,= (0,0, 0,... 1) 


For notational simplicity, we will prove the linear independence in 23 of 
i=(1,0,0), j=(0,1,0), k=(0,0,1) 


The linear independence or linear dependence of these vectors is determined by whether there exist non 
solutions of the vector equation 


Kyi kaj + 43k =0 


Since the component form of this equation is 
(1, &2, 3) = (9, 0, 0) 


it follows that k; = 3 = k3 = 0. This implies that 2 has only the trivial solution and hence that the vec 
linearly independent. 


EXAMPLE 2 Linear Independence in R 


Determine whether the vectors 
vjy=(1, =—2,3), vw2=(5,6,=—1), vw3=(3,2,1) 


are linearly independent or linearly dependent in 2°. 


Solution The linear independence or linear dependence of these vectors is determined by 
whether there exist nontrivial solutions of the vector equation 


Kyvy + &9¥2 + &3¥v3=0 (3) 


or, equivalently, of 
kyQ1, —2, 3) + 49(5, 6, — 1) + 43(3, 2, 1) = (0, 0, 0) 
Equating corresponding components on the two sides yields the homogeneous linear system 
Ky + 5k2 + 3k3=0 
=—2k; + 6k + 2k3 = 0 (4) 
3k, — ko +k3=0 
Thus, our problem reduces to determining whether this system has nontrivial solutions. There 
are various ways to do this; one possibility is to simply solve the system, which yields 
1 1 
Ay>=—<t, kop=—ct, k3=t 
1 2 ? 2 2 ? 3 
(we omit the details). This shows that the system has nontrivial solutions and hence that the 
vectors are linearly dependent. A second method for obtaining the same result is to compute the 
determinant of the coefficient matrix 
iL oS 
A=|-2 6 2 
3 =1 1 
and use parts (b) and (g) of Theorem 2.3.8. We leave it for you to verify that det(.4} = 0, from 
which it follows 3 has nontrivial solutions and the vectors are linearly dependent. 


In Example 2, what relationship do you see 
between the components of ¥1, ¥2, and ¥3 and 
the columns of the coefficient matrix A? 


EXAMPLE 3 Linear Independence in Rt 


Determine whether the vectors 
vj=(1,2,2, =—1), vw2=(4,9,9, =4), vwe=(5,8,9, =5) 


in Rare linearly dependent or linearly independent. 


Solution The linear independence or linear dependence of these vectors is determined by 
whether there exist nontrivial solutions of the vector equation 


kyv1 + &ov2 + k3v3 = 0 
or, equivalently, of 
A1(1, 2,2, — 1) + 424, 9,9, —4) +.43(5, 8, 9, —5) = (0, 0, 0, 0) 
Equating corresponding components on the two sides yields the homogeneous linear system 
ky +4k24+5k3  =0 
2k1 + 9k2+8k3 =0 
2k, + 9k2+9k3 =0 
=k; —4k2-—5k3 =0 
We leave it for you to show that this system has only the trivial solution 
kj=0, &£9=0, &3=0 


from which you can conclude that ¥1, ¥2, and ¥3 are linearly independent. 


EXAMPLE 4 An Important Linearly Independent Setin Pp << 


Show that the polynomials 


form a linearly independent set in P,. 


Solution For convenience, let us denote the polynomials as 


2 
Po= 1, Pi = 4, p2=—% : idee ot 


We must show that the vector equation 
aopo + @1pi +a2p27+ ** + +ayp,=0 (5) 


has only the trivial solution 


ag =a) =a27=>°: +: =a,=0 


But 5 is equivalent to the statement that 
agayxtagx?+ +++ tayx"=0 (6) 


for all x in ( — 00, 60), so we must show that this holds if and only if each coefficient in 6 is zero. 
To see that this is so, recall from algebra that a nonzero polynomial of degree 1 has at most n 
distinct roots. That being the case, each coefficient in 6 must be zero, for otherwise the left side of 
the equation would be a nonzero polynomial with infinitely many roots. Thus, 5 has only the 
trivial solution. 


The following example shows that the problem of determining whether a given set of vectors in ?,, is linearly 
independent or linearly dependent can be reduced to determining whether a certain set of vectors in 2” is 
linearly dependent or independent. 


EXAMPLE 5 Linear Independence of Polynomials 


Determine whether the polynomials 


pi=l—x, po=54 3x — 2x7, p3=14 3x — x? 


are linearly dependent or linearly independent in P. 


Solution The linear independence or linear dependence of these vectors is determined by 
whether there exist nontrivial solutions of the vector equation 


Kip, + &2p2 + &3p3 =0 (7) 
This equation can be written as 
ky (1—3] } kea(5 + 3x — 2x74 ka (1 3x— x7) =0 (8) 


or, equivalently, as 
(e + 5k2 4 ks) 4 (-* + 3k24 3k3)x 4 (— 2k2—k3}x?=0 


Since this equation must be satisfied by all x in ( — co, 00), each coefficient must be zero (as 
explained in the previous example). Thus, the linear dependence or independence of the given 
polynomials hinges on whether the following linear system has a nontrivial solution: 

ky +542 +43=0 

=k, + 3k2 + 3k3 =0 (9) 
=—2k3—k3=0 

We leave it for you to show that this linear system has a nontrivial solutions either by solving it 
directly or by showing that the coefficient matrix has determinant zero. Thus, the set 
{P1. P2. P3} is linearly dependent. 


In Example 5, what relationship do you see 
between the coefficients of the given 
polynomials and the column vectors of the 
coefficient matrix of system 9? 


An Alternative Interpretation of Linear Independence 


The terms /inearly dependent and linearly independent are intended to indicate whether the vectors in a given 
set are interrelated in some way. The following theorem, whose proof is deferred to the end of this section, 
makes this idea more precise. 


THEOREM 4.3.1 


A set S with two or more vectors is 


(a) Linearly dependent if and only if at least one of the vectors in S is expressible as a linear 
combination of the other vectors in S. 


(b) Linearly independent if and only if no vector in S is expressible as a linear combination of the 
other vectors in S. 


EXAMPLE 6 Example 1 Revisited 


In Example | we showed that the standard unit vectors in ®” are linearly independent. Thus, it 
follows from Theorem 4.3.1 that none of these vectors is expressible as a linear combination of 
the other two. To illustrate this in 2, suppose, for example, that 


k=kyi+ kj 
or in terms of components that 
(0, 0, 1) = (1, &2, 9) 


Since this equation cannot be satisfied by any values of k; and &, there is no way to express k 
as a linear combination of i and j. Similarly, iis not expressible as a linear combination of j and 
k, and j is not expressible as a linear combination of i and k. 


EXAMPLE 7 Example 2 Revisited 


In Example 2 we saw that the vectors 
vy= (1, =—2,3), v2=(5,6, =—1), v3= (3,2, 1) 


are linearly dependent. Thus, it follows from Theorem 4.3.1 that at least one of these vectors is 


expressible as a linear combination of the other two. We leave it for you to confirm that these 
vectors satisfy the equation 


31 + 3¥2-¥3=0 
from which it follows, for example, that 
1 1 
v3= 5¥1 so 5¥2 


Sets with One or Two Vectors 


The following basic theorem is concerned with the linear independence and linear dependence of sets with 
one or two vectors and sets that contain the zero vector. 


THEOREM 4.3.2 


(a) A finite set that contains 0 is linearly dependent. 
(b) Aset with exactly one vector is linearly independent if and only if that vector is not 0. 


(c) Aset with exactly two vectors is linearly independent if and only if neither vector is a scalar 
multiple of the other. 


Jozef Hoéné de Wronski (1778-1853) 


Historical Note The Polish-French mathematician Jozef Hoéné de Wronski was born Jozef Hoéné 
and adopted the name Wronski after he married. Wronski's life was fraught with controversy and 
conflict, which some say was due to his psychopathic tendencies and his exaggeration of the 
importance of his own work. Although Wronski's work was dismissed as rubbish for many years, and 
much of it was indeed erroneous, some of his ideas contained hidden brilliance and have survived. 
Among other things, Wronski designed a caterpillar vehicle to compete with trains (though it was 


never manufactured) and did research on the famous problem of determining the longitude of a ship at 
sea. His final years were spent in poverty. 
[Image: wikipedia] 


We will prove part (a) and leave the rest as exercises. 


Proof (a) For any vectors v1, v3, ..., Vy, the set S= {wv , v3, ..., vy, 0} is linearly dependent since the 
equation 
Ov; +0vo+--- +0v,+1(0) =0 


expresses 0 as a linear combination of the vectors in S with coefficients that are not all zero. 


EXAMPLE 8 Linear Independence of Two Functions 


The functions f ; = x and fz = sin x are linearly independent vectors in #’( — 00, 00) since 
neither function is a scalar multiple of the other. On the other hand, the two functions 

g; = sin 2x and gj = sin x cos x are linearly dependent because the trigonometric identity 
sin 2x = 2 sin x cos x reveals that G1 and 2 are scalar multiples of each other. 


A Geometric Interpretation of Linear Independence 


Linear independence has the following useful geometric interpretations in 22 and p23: 


* Two vectors in 22 or R2 are linearly independent if and only if they do not lie on the same line when they 
have their initial points at the origin. Otherwise one would be a scalar multiple of the other (Figure 4.3.3). 


(a) Linearly dependent (b) Linearly dependent (c) Linearly independent 


Figure 4.3.3 


° Three vectors in 2? are linearly independent if and only if they do not lie in the same plane when they have 


their initial points at the origin. Otherwise at least one would be a linear combination of the other two 
(Figure 4.3.4). 


(a) Linearly dependent (6) Linearly dependent (c) Linearly independent 


Figure 4.3.4 


At the beginning of this section we observed that a third coordinate axis in 22 is superfluous by showing that 


a unit vector along such an axis would have to be expressible as a linear combination of unit vectors along the 
positive x- and y-axis. That result is a consequence of the next theorem, which shows that there can be at most 
n vectors in any linearly independent set 2”. 


It follows from Theorem 4.3.3, for example, 
that a set in 22 with more than two vectors is 


linearly dependent and a set in 2? with more 
than three vectors is linearly dependent. 


THEOREM 4.3.3 


Let S= {vy , ¥9,..., Vy} be a set of vectors in R”. If p= », then S is linearly dependent. 


Proof Suppose that 


Vi = (11,12, °° +, Vin) 

v2 = (v21,¥22, °° *,¥2n) 

Vy = (91, Ve2, °° 1. Ven) 
and consider the equation 

kiwi +kovg2+ °° + +%,v,=0 


If we express both sides of this equation in terms of components and then equate the corresponding 
components, we obtain the system 


vik Fvqke+ ss tv, = 0 
vigk, #v2gka+ °° +k, = 0 


Vinkt + Vank2+ ++ + +¥mky = 0 
This is a homogeneous system of 1 equations in the ry unknowns #1, ..., &». Since p = », it follows from 
Theorem 1.2.2 that the system has nontrivial solutions. Therefore, S= {v1, v2, ..., Vy} is a linearly 
dependent set. 


CALCULUS REQUIRED 
Linear Independence of Functions 


Sometimes linear dependence of functions can be deduced from known identities. For example, the functions 
fi = sin*x, f>=cos?x, and f3=5 
form a linearly dependent set in #’{ — co, 00), since the equation 
Sf, +5f2;—f3 =Ssin2x + 5cos*x —5 
=5 (sin”x + cost} -5=0 
expresses 0 as a linear combination of f ;, f 3, and f 3 with coefficients that are not all zero. 
Unfortunately, there is no general method that can be used to determine whether a set of functions is linearly 


independent or linearly dependent. However, there does exist a theorem that is useful for establishing linear 
independence in certain circumstances. The following definition will be useful for discussing that theorem. 


DEFINITION 2 


Iff,; = 7 1(x), f2= f2(x), -...f, =f (x) are functions that are » — | times differentiable on the 
interval (— 0, co ), then the determinant 


Fi) F2(x) cae Falx) 

Fit) fae) ot Fn) 
W(x) =|; 

6) 96) 2H 


is called the Wronskian of 7 1, 73, --.. f n- 


Suppose for the moment that fy = f 1 (x), £2 = f2(x), ....f£, =f »(x) are linearly dependent vectors in 
oy ( — co, 00}, This implies that for certain values of the coefficients the vector equation 
kyf) +kofo+ +--+ +4,f,=0 


has a nontrivial solution, or equivalently that the equation 


Ki 10x) + kof) + tka fi n(x) =9 
is satisfied for all x in ( — co, oo). Using this equation together with those that result by differentiating it 
»% — | times yields the linear system 


kifi(x)  +kofatx) +++ +k fh n(x) = 
kif i (x) +kofh(x) +++ +knfh(x) 6 


nfo {x} +42 a (x) + ee ee ae (| -9 


Thus, the linear dependence of f 1, f2, ..., £,, implies that the linear system 


F1@) F(x) st fy (x) ; 
1 
fife) fie) t+ fale) Wed | 
; ee ”) 
HE) HOG) Glee] Le 


has a nontrivial solution. But this implies that the determinant of the coefficient matrix of 10 is zero for every 
such x. Since this determinant is the Wronskian of #1, 7, -.., /, we have established the following result. 


THEOREM 4.3.4 


If the functions f 1, f, ..., f, have x, — |] continuous derivatives on the interval ( — co, 00), and if the 
Wronskian of these functions is not identically zero on ( — co, 00), then these functions form a 


linearly independent set of vectors in oo-y ( — co, 00}. 


In Example 8 we showed that x and sin x are linearly independent functions by observing that neither is a 
scalar multiple of the other. The following example shows how to obtain the same result using the Wronskian 
(though it is a more complicated procedure in this particular case). 


EXAMPLE 9 Linear Independence Using the Wronskian 
Use the Wronskian to show that f ; = x and f3 = sin x are linearly independent. 
Solution The Wronskian is 


wf ie 


This function is not identically zero on the interval ( — co, 00) since, for example, 


WE) = Soe(G) 8) <5 


Thus, the functions are linearly independent. 


x sinx . 
=x cosx—sinx 
1 cosx 


WARNING 


The converse of Theorem 4.3.4 is false. If the 
Wronskian of f 1, £3, ..., f£,, is identically zero 
on ( =o, oo), then no conclusion can be 
reached about the linear independence of 
{f1, £2, .... £,,} — this set of vectors may be 
linearly independent or linearly dependent. 


EXAMPLE 10 Linear Independence Using the Wronskian 
Use the Wronskian to show that fy = 1, f3 = e*, and f3= got are linearly independent. 


Solution The Wronskian is 
1 2 eet 
Wixy=l0 2% 2%%2%|= 23 
0 2 Ae? 


This function is obviously not identically zero on { — 00, 00), so f;, £2, and fz form a linearly 
independent set. 


OPTIONAL 


We will close this section by proving part (a) of Theorem 4.3.1. We will leave the proof of part (b) as an 
exercise. 


Proof of Theorem 4.3.1 (a) Let S= {v¥ 1, v3, ..., Vy} be a set with two or more vectors. If we assume 
that S is linearly dependent, then there are scalars k1, £3, ..., ky, not all zero, such that 


Aivy | kv2 Free: kyv, = 0 (11) 


To be specific, suppose that &; # 0. Then 11 can be rewritten as 


vj= (- 72) 5 5 [- =} 


which expresses ¥1 as a linear combination of the other vectors in S. Similarly, if ky #0 in 11 for some 
j= 2, 3,..., 7, then ¥j is expressible as a linear combination of the other vectors in S. 


Conversely, let us assume that at least one of the vectors in S' is expressible as a linear combination of the 
other vectors. To be specific, suppose that 

Vy =C2V2 CNR 8 * + C,pY¥y 
Xe) 


V1 —09V2 —c3V3— °° + =—c,v,=0 
It follows that S' is linearly dependent since the equation 
kyvy +kovg+ °° + +h,v,=0 
is satisfied by 
Ay=1, ko= =—c2,.., k= ey 


which are not all zero. The proof in the case where some vector other than ¥j is expressible as a linear 
combination of the other vectors in S' is similar. 


Concept Review 

© Trivial solution 

e Linearly independent set 

e Linearly dependent set 

e Wronskian 

Skills 

e Determine whether a set of vectors is linearly independent or linearly dependent. 

e Express one vector in a linearly dependent set as a linear combination of the other vectors in the set. 


e Use the Wronskian to show that a set of functions is linearly independent. 


Exercise Set 4.3 


1. Explain why the following are linearly dependent sets of vectors. (Solve this problem by inspection.) 
(a) uy =(—1, 2,4) andug= (5, — 10, — 20) in Rp? 
(b) uy = (3, — 1), ug = (4, 5), u3 = ( —4, 7) in R2 
(c) pp =3—2x4 x? and po = 6 — 4x 4 2x? in P2 


d —3 4 3 —4]|. 
® a=| D 5 |e s=|_3 5 | in Ma 


Answer: 


(a) U2 is ascalar multiple of U4. 

(b) The vectors are linearly dependent by Theorem 4.3.3. 
(c) P2 is a scalar multiple of P1. 

(d) Bis ascalar multiple of A. 


2. Which of the following sets of vectors in 2? are linearly dependent? 
(a) (4, —1,2), (—4, 10, 2) 


nag 


> 


an 


nN 


I 


(by Ne Moi ele (19) 
(co), = 13). 0, 1) 
(gy bes AS ee, Lee DO =e) 


Which of the following sets of vectors in 24 are linearly dependent? 

(a) (3, 8,7, =—3), (1,5, 3, =—1), (2, = 1, 2, 6), (1, 4, 0, 3) 

(b) (0, 0, 2, 2), (3, 3, 0,0), 1, 1,0, = 1) 

(c) (0,3, =—3, = 6), (—2,0,0, —6), (0, —4, —2, —2), (0, —8,4, —4) 
(d) (3,0, — 3, 6), (0, 2,3, 1), (0, —2, —2, 0), (—2, 1, 2, 1) 


Answer: 


None 


. Which of the following sets of vectors in 3 are linearly dependent? 


(a) 2—x44x7,3 + 6x + 2x7, 2 + 10x — 4x7 

(b) 3-4 x4x7,2—x +4 5x7,4 — 3x2 

(c) 6—x? 

(d) 143x4+ 3x7, 4427.5 + 6x + 3x4,7 + 2x — x7 


. Assume that ¥1, ¥2, and V3 are vectors in 2? that have their initial points at the origin. In each part, 


determine whether the three vectors lie in a plane. 
(a) vy = (2, —2, 0), v2 = (6, 1, 4), v3 = (2, 0, — 4) 
(b) vy = (—6, 7, 2), v2 = (3, 2, 4), v3 = (4, — 1, 2) 


Answer: 


(a) They do not lie in a plane. 
(b) They do lie in a plane. 


. Assume that ¥1, ¥3, and ¥3 are vectors in R? that have their initial points at the origin. In each part, 


determine whether the three vectors lie on the same line. 

(a) vy = (= 1, 2,3), v2 = (2, —4, — 6), v3 = (= 3, 6, 0) 
(b) vy = (2, = 1, 4), v2 = (4, 2, 3), v3 = (2,7, = 6) 

(c) vy = (4, 6, 8), v2 = (2, 3,4), v3 = (—2, =—3, = 4) 


*(a) Show that the three vectors vj = (0, 3, 1, — 1), v2 = (6, 0, 5, 1), andw3 = (4, —7, 1, 3) forma 


linearly dependent set in R4. 


(b) Express each vector in part (a) as a linear combination of the other two. 


Answer: 


0) y= aL = 3v3, = an + 3yy, y= Jy rf 2y, 


2 3 


8. (a) Show that the three vectors vj = (1, 2, 3, 4), v2 = (0, 1,0, — 1), and v3 = (1, 3, 3, 3) forma 
linearly dependent set in 24. 


(b) Express each vector in part (a) as a linear combination of the other two. 


9. For which real values of \, do the following vectors form a linearly dependent set in 23? 


“i= (A -~5, -3} v= (—3.4 -3} v= (—>, =>] 


10. Show that if {v,, v2, ¥3} isa linearly independent set of vectors, then so are 
(v1, ¥2}. (v1, v3}, (v2, v3}. (wit. {v2}, and {w3}. 

. Show that if S= {w1, v3, ..., vy} is a linearly independent set of vectors, then so is every nonempty 
subset of S. 


1 


_ 


12. Show that if S= {wv 1, v2, v3} is a linearly dependent set of vectors in a vector space V, and V4 is any 
vector in V that is not in S, then {¥1, ¥2, ¥3, V4} is also linearly dependent. 


13. Show that if S= {wv 1, v3, ..., v»} is a linearly dependent set of vectors in a vector space V, and if 
¥y41,---» ¥y are any vectors in V that are not in S, then {v1, V3, .... ¥y, Vy41,-- Vy} 18 also linearly 
dependent. 


14. Show that in 3 every set with more than three vectors is linearly dependent. 


15. Show that if {v1, v2} is linearly independent and ¥3 does not lie in span {vj, v2}, then {vj, v2, ¥3} is 
linearly independent. 


16. Prove: For any vectors u, v, and w ina vector space V, the vectors y — y, y — w, and y — y forma 
linearly dependent set. 


17. Prove: The space spanned by two vectors in 2? is a line through the origin, a plane through the origin, or 
the origin itself. 


18. Under what conditions is a set with one vector linearly independent? 


19. Are the vectors ¥1, ¥2, and V3 in part (a) of the accompanying figure linearly independent? What about 
those in part (b)? Explain. 


(b) 


Figure Ex-19 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


Answer: 


(a) They are linearly independent since ¥;, wz, and ¥3 do not lie in the same plane when they are placed 
with their initial points at the origin. 


(b) They are not linearly independent since ¥;, v3, and ¥3 line in the same plane when they are placed 
with their initial points at the origin. 

By using appropriate identities, where required, determine which of the following sets of vectors in 

(=, 00) are linearly dependent. 

(a) 6,3 sin?x, 2 cos¢x 

(b) %, cos x 

(c) 1, smx, sin2x 

(d) cos 2x, sin“ x, cos?x 

(e:) (3—x)7, x*—6x, 5 

(f) 0, cos?ax, sin? 3x 


The functions # ; (x) =x and #3{x} =cos x are linearly independent in #'( — co, 00) because neither 
function is a scalar multiple of the other. Confirm the linear independence using Wronski's test. 


Answer: 
W(x} = —x sin x —cos x #0 for some x. 
The functions 7 ;(x)} = sin x and j'3{x)} = cos x are linearly independent in # ( — 00, 00) because 


neither function is a scalar multiple of the other. Confirm the linear independence using Wronski's test. 


(Calculus required) Use the Wronskian to show that the following sets of vectors are linearly 
independent. 


(ay 1, 2%). # 
(b) 1, x, x? 


x 


Answer: 

(a) W(x’) =e* #0 

(b) W(x) =240 

Show that the functions ,f 4 (x } =e f2 (x) = xe", and f3 (x) = xe" are linearly independent. 

Show that the functions #4 (x) = sin x, f2(x) =cos x, and #3{x) =x cos x are linearly independent. 
Answer: 


W(x) = 2 sin x # 0 for some x. 


Use part (a) of Theorem 4.3.1 to prove part (5). 


27. Prove part (b) of Theorem 4.3.2. 


28. (a) In Example | we showed that the mutually orthogonal vectors i, j, and k form a linearly independent 
set of vectors in 23. Do you think that every set of three nonzero mutually orthogonal vectors in R? is 


linearly independent? Justify your conclusion with a geometric argument. 


(b) Justify your conclusion with an algebraic argument. [Hint: Use dot products. | 
True-False Exercises 


In parts (a)-(h) determine whether the statement is true or false, and justify your answer. 
(a) A set containing a single vector is linearly independent. 
Answer: 


False 


(b) The set of vectors {v, Av} is linearly dependent for every scalar k. 
Answer: 


True 


(c) Every linearly dependent set contains the zero vector. 
Answer: 


False 


(d) If the set of vectors {v1, ¥2, v3} is linearly independent, then {Aw,, <v2, 4v3} is also linearly 
independent for every nonzero scalar k. 


Answer: 


True 


e) If v1, ..., ¥» are linearly dependent nonzero vectors, then at least one vector Vj, is a unique linear 
1 ” y dep q 
combination of v1, -.., ¥x—4 


Answer: 


True 


(f) The set of 2 % 2 matrices that contain exactly two I's and two 0's is a linearly independent set in Af 99. 
Answer: 


False 


(g) The three polynomials (x — 1) (x + 2), x(x + 2), and x(x — 1) are linearly independent. 
Answer: 


True 


(h) The functions 7; and #3 are linearly dependent if there is a real number x so that 
kif 100) + 427 2(x) =0 for some scalars &; and &3. 


Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


4.4 Coordinates and Basis 


We usually think of a line as being one-dimensional, a plane as two-dimensional, and the space around us as three- 
dimensional. It is the primary goal of this section and the next to make this intuitive notion of dimension precise. 
In this section we will discuss coordinate systems in general vector spaces and lay the groundwork for a precise 
definition of dimension in the next section. 


Coordinate Systems in Linear Algebra 


In analytic geometry we learned to use rectangular coordinate systems to create a one-to-one correspondence 
between points in 2-space and ordered pairs of real numbers and between points in 3-space and ordered triples of 
real numbers (Figure 4.4.1). Although rectangular coordinate systems are common, they are not essential. For 
example, Figure 4.4.2 shows coordinate systems in 2-space and 3-space in which the coordinate axes are not 
mutually perpendicular. 


| } 

| Coordinates of P in a rectangular | Coordinates of P in a rectangular | 

| coordinate system in 2-space. coordinate system in 3-space. 
Figure 4.4.1 


Coordinates of P in a nonrectangular 


Coordinates of P in a nonrectangular | 
coordinate system in 2-space. Cc 


oordinate system in 3-space 


Figure 4.4.2 


In linear algebra coordinate systems are commonly specified using vectors rather than coordinate axes. For 
example, in Figure 4.4.3 we have recreated the coordinate systems in Figure 4.4.2 by using unit vectors to identify 
the positive directions and then attaching coordinates to a point P using the scalar coefficients in the equations 


OP =au, + bu2 and OP =au + bu; +cu3 


P(a, b) 


avy 


Figure 4.4.3 


Units of measurement are essential ingredients of any coordinate system. In geometry problems one tries to use 
the same unit of measurement on all axes to avoid distorting the shapes of figures. This is less important in 
applications where coordinates represent physical quantities with diverse units (for example, time in seconds on 
one axis and temperature in degrees Celsius on another axis). To allow for this level of generality, we will relax 
the requirement that unit vectors be used to identify the positive directions and require only that those vectors be 
linearly independent. We will refer to these as the “basis vectors” for the coordinate system. hi summary, it is the 
directions of the basis vectors that establish the positive directions, and it is the lengths of the basis vectors that 
establish the spacing between the integer points on the axes (Figure 4.4.4). 


Equal spacing | Unequal spacing Equal spacing | Unequal sp 
Perpendicular axes Perpendicular axes | Skew axes | Skew axes 
Figure 4.4.4 


Basis for a Vector Space 


The following definition will make the preceding ideas more precise and will enable us to extend the concept of a 
coordinate system to general vector spaces. 


Note that in Definition 1 we have required a basis 
to have finitely many vectors. Some authors call 
this a finite basis, but we will not use this 
terminology. 


DEFINITION 1 
If V is any vector space and S= {¥ 1, v3, ..., ¥,} isa finite set of vectors in V, then S is called a basis for 
V if the following two conditions hold: 


(a) Sis linearly independent. 
(b) S spans V. 


If you think of a basis as describing a coordinate system for a vector space in V, then part (a) of this definition 
guarantees that there is no interrelationship between the basis vectors, and part (b) guarantees that there are 
enough basis vectors to provide coordinates for all vectors in V. Here are some examples. 


EXAMPLE 1 The Standard Basis forR” 


Recall from Example 11 of Section 4.2 that the standard unit vectors 
e; = (1,0, 0,..,0), eg= (0, 1,0,..,0),... e,= (0, 0,0,..., 1) 
span R” and from Example | of Section 4.3 that they are linearly independent. Thus, they form a 
basis for 2” that we call the standard basis for 2”. In particular, 
i=(1,0,0), j=(0,1,0), k= (0,0, 1) 
is the standard basis for 27. 


EXAMPLE 2 The Standard Basis for Pn << 


Show that S= f1, x, x, soos x" is a basis for the vector space ?,, of polynomials of degree n or 
less. 

Solution We must show that the polynomials in S are linearly independent and span ?,,. Let us 
denote these polynomials by 


po=1, pi=s, p2=x7,.., Py=x" 
We showed in Example 13 of Section 4.2 that these vectors span ?,, and in Example 4 of Section 
4.3 that they are linearly independent. Thus, they form a basis for P,, that we call the standard basis 
for Py. 


EXAMPLE 3 Another Basis forR? “4 
Show that the vectors vj = (1, 2, 1), v2 = (2, 9, 0), and wz = (3, 3, 4) form a basis for 23. 


Solution We must show that these vectors are linearly independent and span 22. To prove linear 


independence we must show that the vector equation 


Civ, +c2¥2 + ¢3¥3=0 (1) 


has only the trivial solution; and to prove that the vectors span 2 we must show that every vector 
b = (41, 53, 3) in R? can be expressed as 


C1Vvy +c2¥2 +c3v3=hb (2) 


By equating corresponding components on the two sides, these two equations can be expressed as 
the linear systems 


€1 + 2¢2 + 3¢3=0 C1 + 2c9 + 3¢3 = 31 
20, + 9e,+ 3¢3=0 and 2¢, 4+ 9e34+ 3c3 = 43 (3) 
cy + 4e3=0 cC{ + 4e3 = 3 


(verify). Thus, we have reduced the problem to showing that in 3 the homogeneous system has only 
the trivial solution and that the nonhomogeneous system is consistent for all values of b;, b3, and b3 
. But the two systems have the same coefficient matrix 


| 
A=|2 9 3 
10 4 
so it follows from parts (5), (e), and (g) of Theorem 2.3.8 that we can prove both results at the same 
time by showing that det(.A) # 0. We leave it for you to confirm that det(.4) = — 1, which proves 


that the vectors ¥{, ¥3, and ¥3 form a basis for R. 


EXAMPLE 4 The Standard Basis for Mmn 


Show that the matrices 


form a basis for the vector space A433 of 2 x 2 matrices. 


Solution We must show that the matrices are linearly independent and span Jf3. To prove linear 
independence we must show that the equation 


01M) +022 +¢3Mé3+c4M@4=—0 (4) 


has only the trivial solution, where 0 is the 2 x 2 zero matrix; and to prove that the matrices span 
Mé 33 we must show that every 2 x 2 matrix 


r[ 


01M, +09Mé2+¢3Mé3+ce4M4=8 (5) 


can be expressed as 


The matrix forms of Equations 4 and 5 are 
and 


which can be rewritten as 


f1 ©2 0 0 fC, ©2 ab 
= and = 

c3 e4 0.0 €3 4 cd 

Since the first equation has only the trivial solution 
ej =¢g=c3=c4=0 
the matrices are linearly independent, and since the second equation has the solution 
Cy=a, ¢7=6, c3=c¢, c4=a 

the matrices span Af. This proves that the matrices Af;, Mf, M3, M4 form a basis for Af 3. 


More generally, the mn different matrices whose entries are zero except for a single entry of 1 form 
a basis for Af»), called the standard basis for M yyy). 


Some writers define the empty set to be a basis 
for the zero vector space, but we will not do so. 


It is not true that every vector space has a basis in the sense of Definition 1. The simplest example is the zero 
vector space, which contains no linearly independent sets and hence no basis. The following is an example of a 
nonzero vector space that has no basis in the sense of Definition 1 because it cannot be spanned by finitely many 
vectors. 


EXAMPLE 5 A Vector Space That Has No Finite Spanning Set 
Show that the vector space of ?.. of all polynomials with real coefficients has no finite spanning set. 


Solution If there were a finite spanning set, say S= {p1, p2, --., Py} , then the degrees of the 
polynomials in S would have a maximum value, say n; and this in turn would imply that any linear 
combination of the polynomials in S would have degree at most n. Thus, there would be no way to 
express the polynomial +! as a linear combination of the polynomials in S, contradicting the fact that 


the vectors in S span P... 


For reasons that will become clear shortly, a vector space that cannot be spanned by finitely many vectors is said 
to be infinite-dimensional, whereas those that can are said to be finite-dimensional. 


EXAMPLE 6 Some Finite-and Infinite-Dimensional Spaces << 


In Example 1, Example 2, and Example 4 we found bases for 2”, Py, and My», So these vector 
spaces are finite-dimensional. We showed in Example 5 that the vector space P.. is not spanned by 
finitely many vectors and hence is infinite-dimensional. In the exercises of this section and the next 
we will ask you to show that the vector spaces R™, # ( — 00, 00), C'( — 00, 00), cm (— 00, 00}, and 


C'™( = 00, 00) are infinite-dimensional. 


Coordinates Relative to a Basis 


Earlier in this section we drew an informal analogy between basis vectors and coordinate systems. Our next goal is 
to make this informal idea precise by defining the notion of a coordinate system in a general vector space. The 
following theorem will be our first step in that direction. 


THEOREM 4.4.1 Uniqueness of Basis Representation 


IfS= {v1, V2, -.., ¥,} is a basis for a vector space V, then every vector v in V can be expressed in the 
form ¥ = ¢C1{Vj + C3¥2+ * * * + ¢yV¥y in exactly one way. 


Proof Since S spans J, it follows from the definition of a spanning set that every vector in Vis expressible as a 
linear combination of the vectors in S. To see that there is only one way to express a vector as a linear combination 
of the vectors in S, suppose that some vector v can be written as 


VSeyyy Hegvg t+ +t Heyy, 
and also as 

ve=kyvy +kovgt °° + +kyvy, 
Subtracting the second equation from the first gives 

O= (e1 —&1)¥1 + (C2 —Aa)v2+ + + + + en ky) ¥y 
Since the right side of this equation is a linear combination of vectors in S, the linear independence of S implies 
that 
ep—ky =O, cg—kg=0,.., cy—k,=—0 

that is, 

ep=ky, cp=%o,.., Cyh=ky 


Thus, the two expressions for v are the same. 


Figure 4.4.5 


Sometimes it will be desirable to write a 
coordinate vector as a column matrix, in which 
case we will denote it using square brackets as 

cy 

c2 

[v]s=| . 

Cy 
We will refer to [v] 5 as a coordinate matrix and 
reserve the terminology coordinate vector for the 
comma delimited form (¥) ». 


We now have all of the ingredients required to define the notion of “coordinates” in a general vector space V. For 
motivation, observe that in R3, for example, the coordinates (a, b, c) of a vector v are precisely the coefficients in 


the formula 
v=ai+ dj+ck 
that expresses v as a linear combination of the standard basis vectors for 23 (see Figure 4.4.5). The following 


definition generalizes this idea. 


DEFINITION 2 


IfS= {¥1, V2, ..., ¥,} 1s a basis for a vector space V, and 

Veeyyy $egvg + + * HeyVy 
is the expression for a vector v in terms of the basis S, then the scalars ¢1, ¢3, ..., Cy are called the 
coordinates of v relative to the basis S. The vector (c1, ¢3, ..., Cy) in R”™ constructed from these 
coordinates is called the coordinate vector of v relative to S; it is denoted by 


Wye= a ee Cy) (6) 


Remark Recall that two sets are considered to be the same if they have the same members, even if those 


members are written in a different order. However, if S= {v1, V2, -... Vy} 1s a set of basis vectors, then changing 
the order in which the vectors are written would change the order of the entries in (v) 5, possibly producing a 
different coordinate vector. To avoid this complication, we will make the convention that in any discussion 
involving a basis S the order of the vectors in S remains fixed. Some authors call a set of basis vectors with this 
restriction an ordered basis. However, we will use this terminology only when emphasis on the order is required 
for clarity. 


Observe that (v) 5 is a vector in R”, so that once basis S is given for a vector space V, Theorem 4.4.1 establishes a 
one-to-one correspondence between vectors in V and vectors in 8” (Figure 4.4.6). 


A one-to-one correspondence 


V R" 
Figure 4.4.6 


EXAMPLE 7 Coordinates Relative to the Standard Basis forR” 


In the special case where ” = 8” and S is the standard basis, the coordinate vector (v) 5 and the vector 
v are the same; that is, 


v=(v)5 
For example, in 2? the representation of a vector v = (a, », c) as a linear combination of the vectors in 
the standard basis S= {i, j, k} is 
v=ai+ dj+ck 


so the coordinate vector relative to this basis is (v) »= (a, &, c), which is the same as the vector v. 


EXAMPLE 8 Coordinate Vectors Relative to Standard Bases @ 


(a) Find the coordinate vector for the polynomial 
p(x}=co boyx eax? beet beeyx” 


relative to the standard basis for the vector space Py). 


| 


(b) Find the coordinate vector of 


relative to the standard basis for Af 9. 


Solution 


(a) The given formula for p(x} expresses this polynomial as a linear combination of the standard 


2 


basis vectors S= 41, x,2°,..,.%7 >. Thus, the coordinate vector for p relative to S is 


Dye= (co, cl, c2; os95 Cy) 


(b) We showed in Example 4 that the representation of a vector 


[3 


as a linear combination of the standard basis vectors is 


o=[2 e=a{s ole alte ole 4] 


so the coordinate vector of B relative to S is 


(4) s= (, 4, ¢, d) 


EXAMPLE 9 Coordinates inR°® 


(a) We showed in Example 3 that the vectors 
vyj=(1,2,1), v2=(2,9,0), v3=(3, 3,4) 
form a basis for 2. Find the coordinate vector of y= (5, — 1, 9) relative to the basis 
S= {v1, ¥9, V3}. 


(b) Find the vector v in R3 whose coordinate vector relative to Sis (v) ¢= ( — 1, 3, 2). 


Solution 


(a) To find (w) ¢ we must first express v as a linear combination of the vectors in S; that is, we must 
find values of ¢1, ¢2, and ¢3 such that 
VECpvy £C2V2 + 03V3 
or, in terms of components, 


(5, — 1,9) =¢4(1, 2, 1) +¢9(2, 9, 0) +€3(3, 3, 4) 


Equating corresponding components gives 


ey +2¢2+3¢3 = 5 
26, +9¢2+3¢e3 = —1 
ch +4ez = 9 
Solving this system we obtain cj = 1,¢3 = — 1,¢3 = 2 (verify). Therefore, 


(v)s= C1, —1, 2) 
(b) Using the definition of (v) 5, we obtain 
vo =(=1)v) + 3v2 + 2v3 
=(—1)(1, 2, 1) + 3(2, 9, 0) + 2(3, 3, 4) = (11, 31, 7) 


Concept Review 

e Basis 

Standard bases for 2”, Py, Mim» 

e Finite-dimensional 

e Infinite-dimensional 

° Coordinates 

e Coordinate vector 

Skills 

e Show that a set of vectors is a basis for a vector space. 
e Find the coordinates of a vector relative to a basis. 


e Find the coordinate vector of a vector relative to a basis. 


Exercise Set 4.4 


1. In words, explain why the following sets of vectors are not bases for the indicated vector spaces. 
(a) uy = (1, 2), uz = (0, 3), uz = (2, 7) for R2 
(b) uy = (—1, 3, 2), ug = (6, 1, 1) for R3 
(c) pp =1 +x4+x7,p2=x—-1 for P3 


@ ,_[1 1] p_f 6 0] ,_[3 0] ,_[5 1] p_[7 1 
a=[> he a1 4f C=] 7pP=l4 ap =o gf ft Maz 


Answer: 


(a) A basis for R2 has two linearly independent vectors. 
(b) A basis for 27 has three linearly independent vectors. 


(c) A basis for 3 has three linearly independent vectors. 
(d) A basis for Af 9 has four linearly independent vectors. 


2. Which of the following sets of vectors are bases for R2? 
(a) ((2, 1), G, 9)} 
(b) (4, 1). (—7, —8)} 
(c) {(0, 0), C1, 3)} 
(d) (G, 9), (—4, = 12)} 
3. Which of the following sets of vectors are bases for 27? 
(a) (C1, 9, 0), (2, 2, 0), (3, 3, 3)} 
(by AAs, t= 4) 44, 58) C1, 48)) 
(oy Me. Soe, 1) ee 1) 


na 


i 


oo | 


1th oa CA =D C1233) 
Answer: 


(a), (b) 


. Which of the following form bases for P3? 


(a) 1—3x4+2x7, 14x44x*, 1=—7x 
(b) 446x4x7, —144x4+2x7, 542x—x? 
(Cc) l4+x+x2, x4x2, x? 


(d) —44-x43x7, 645x42x7, 84+4x4x7 


. Show that the following matrices form a basis for Af 39. 


3 sb [1 of [2 oh | 


. Let V be the space spanned by v1 = Saetes ¥2= sin?x, ¥3 = Cos 2x. 


(a) Show that S= {v1, v2, v3} is not a basis for V. 
(b) Find a basis for V. 


. Find the coordinate vector of w relative to the basis S= {uy, uz} for R2. 


(a) uy = (1, 9), ug = (0, 1); w= (3, = 7) 
(b) uy = (2, — 4), u2 = G3, 8); w= (1, 1) 
(c) uy = (1, 1), ug = (0, 2); w= (@, 2) 


Answer: 
(a) WW) g=G, -7) 
© ws=(2. 2) 


© (w)s= (a, 454) 


. Find the coordinate vector of w relative to the basis S= {uy, uz} of R2. 


(a) w= (1, = 1),u2= (1, 1); w= (1, 9) 
(b) uy = (1, = 1),u2= (1, 1); w= (0, 1) 
(c) w= (1, = 1),u2= (1, 1); w= (1, 1) 


. Find the coordinate vector of v relative to the basis S= {v1, ¥3, V3}. 


(a) v= (2, —1, 3); ¥; =(1, 0, 0), vo = (2, 2, 0), v3 = (3, 3, 3) 
(b) v= (5, — 12, 3); ¥, = (1, 2, 3), 2 = (—4, 5, 6), 3 =(7, —8,9) 


Answer: 


(a) M)s=G, —2, 1) 
(b) (¥) g= (—2, 0, 1) 


1 0 
-1 2 


| 


10. Find the coordinate vector of p relative to the basis S= {pj, p2, p3} - 
(a) p=4—3x4x7; py = 1,P2=%,p3 =x? 
(b) p=2=x | x*;py=1 +X, pp=1+x2,p3=x4 x? 
11. Find the coordinate vector of A relative to the basis S= {.Aj, Az, A3, Ag}. 
2 0 -1 1 1 1 
A= . = ; A = . 


A3 


ll 
es | 
- © 
ao & 
a | 
ths 
fs 

II 
| 
oOo Oo 
—- © 
| 


Answer: 
(A) g=(-1,1, -1, 3) 


In Exercises 12-13, show that {.Aj, A3, Az, Aq} is a basis for Af, and express A as a linear combination of the 
basis vectors. 


Answer: 
A= A; — Az + A3— Ag 


In Exercises 14-15, show that {p;, p2, pz} is a basis for 3, and express p as a linear combination of the basis 
vectors. 


14. py = 14 2x +27, p2=2 + 9x, pz = 3 + 3x 44x23 p= 2+ 17x — 3x7 
15.p) =1+x4 x4, p2=x + x*4,pg=xp=7—x + 2x? 
Answer: 


p= 7p, — 8p2 + 3p3 
16. The accompanying figure shows a rectangular xy-coordinate system and an x'y' coordinate system with 
skewed axes. Assuming that 1-unit scales are used on all the axes, find the x'y' coordinates of the points 
whose xy-coordinates are given. 
(a) (1, 1) 
(b) C1, 0) 
(c) (0, 1) 
(d) (ad) 


x and x' 


Figure Ex-16 


17. The accompanying figure shows a rectangular xy-coordinate system determined by the unit basis vectors i and 
j and an x'y' coordinate system determined by unit basis vectors Uj and U3. Find the x'y' coordinates of the 


points whose xy-coordinates are given. 
© (1 

(b) (1, 0) 

(c) (0, 1) 

(d) (a, d) 


Figure Ex-17 


Answer: 


(a) (2, 0) 

(2 1 
(3° 3 

(c) (0, 1) 

@ [Fe 0-4) 


Ee 


18. The basis that we gave for A433 in Example 4 consisted of noninvertible matrices. Do you think that there is a 
basis for Af consisting of invertible matrices? Justify your answer. 


19. Prove that 2™ is infinite-dimensional. 


True-False Exercises 


In parts (a)—(e) determine whether the statement is true or false, and justify your answer. 


(a) If 7 = span (vy, -... V,}, then {vj,..., v¥,} is a basis for V. 


Answer: 


False 


(b) Every linearly independent subset of a vector space V is a basis for V. 
Answer: 


False 


(c) If {v1, V3, -.., ¥,} 1s a basis for a vector space V, then every vector in V can be expressed as a linear 
combination of v1, ¥3, -.., ¥y 


Answer: 


True 


(d) The coordinate vector of a vector x in ®” relative to the standard basis for 2” is x. 
Answer: 


True 


(e) Every basis of 4 contains at least one polynomial of degree 3 or less. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


4.5 Dimension 


We showed in the previous section that the standard basis 8” has n vectors and hence that the standard basis 


‘ : i 
for 22 has three vectors, the standard basis for 2 has two vectors, and the standard basis for ® (= R} has one 


vector. Since we think of space as three dimensional, a plane as two dimensional, and a line as one 
dimensional, there seems to be a link between the number of vectors in a basis and the dimension of a vector 
space. We will develop this idea in this section. 


Number of Vectors in a Basis 


Our first goal in this section is to establish the following fundamental theorem. 


THEOREM 4.5.1 


All bases for a finite-dimensional vector space have the same number of vectors. 


To prove this theorem we will need the following preliminary result, whose proof is deferred to the end of the 
section. 


THEOREM 4.5.2 


Let V be a finite-dimensional vector space, and let {¥1, ¥2, ....¥,} be any basis. 
(a) Ifaset has more than x vectors, then it is linearly dependent. 


(b) If a set has fewer than n vectors, then it does not span V. 


Some writers regard the empty set to be a basis 
for the zero vector space. This is consistent with 
our definition of dimension, since the empty set 
has no vectors and the zero vector space has 
dimension zero. 


We can now see rather easily why Theorem 4.5.1 is true; for if 
S= {¥1, V2; corn Vy} 
is an arbitrary basis for V, then the linear independence of S implies that any set in V with more than n vectors 


is linearly dependent and any set in V with fewer than vectors does not span V. Thus, unless a set in V has 
exactly n vectors it cannot be a basis. 


We noted in the introduction to this section that for certain familiar vector spaces the intuitive notion of 
dimension coincides with the number of vectors in a basis. The following definition makes this idea precise. 


Engineers often use the term degrees of 
freedom as a synonym for dimension. 


DEFINITION 1 


The dimension of a finite-dimensional vector space V is denoted by dim(”} and is defined to be the 
number of vectors in a basis for V. In addition, the zero vector space is defined to have dimension zero. 


EXAMPLE 1 Dimensions of Some Familiar Vector Spaces 


dim (R"") =7 The standard basis has # vectors. 


dim(?,,) =x + 1 The standard basis has # +- 1 vectors. 


dim (Af y.) =+ The standard basis has 7 vectors. 


EXAMPLE 2 DimensionofSpan(S) <4 


If S= {v1, ¥2, ..., Vy} is a linearly independent set in a vector space V, then S is automatically 
a basis for span(S) (why?), and this implies that 


dim [span(S)] =r 


In words, the dimension of the space spanned by a linearly independent set of vectors is equal to 
the number of vectors in that set. 


EXAMPLE 3 Dimension of a Solution Space 


Find a basis for and the dimension of the solution space of the homogeneous system 
2X1 + 2x2z— x3 #x5=0 
—xX1 — X94 2x3 — 3x44+x5=0 
X1+x%2— 2x3 =—x5=0 
X34+x%4+%x%5=0 


Solution We leave it for you to solve this system by Gauss-Jordan elimination and show that 
its general solution is 


xXy=es—t, x3=>s, x3=> =f, xq4=0, x5=f 


which can be written in vector form as 
(x1, %2,%3,%4,%5) = (—s—f,s, —£,0,8) 
or, alternatively, as 
(x1, %2, X3, %4,%5) =s(—1, 1,0,0,0) +2¢—1,0, = 1,0, 1) 


This shows that the vectors vj = (= 1, 1, 0,0, 0) andvz= (= 1, 0, — 1, 0, 1) span the 
solution space. Since neither vector is a scalar multiple of the other, they are linearly independent 
and hence form a basis for the solution space. Thus, the solution space has dimension 2. 


EXAMPLE 4 Dimension of a Solution Space <@ 


Find a basis for and the dimension of the solution space of the homogeneous system 


X1 + 3x2— 2x3 + 2x5 =0 

2x1 + 6x3 — 5x3 — 2x4+4x5— 3x6=0 
5x3 + 10x4 + 15xg=0 

2x1 + 6x2 + 8x4+4x5 + 18x6=0 


Solution In Example 6 of Section 1.2 we found the solution of this system to be 
xy= —3r—4s—2f, xg=7, x3=> —-28, x4=s, x5=f, xg=—0 
which can be written in vector form as 
(X1,%2, %3,%4, %5, %§) = ( — 3r —45 — 28, 7, — 2s, 5, t, 0) 
or, alternatively, as 
(x1, %2, %3, %4, %5) =r — 3, 1, 0,0, 0,0) +s —4, 0, —2, 1, 0,0) +246 —2, 0, 0, 0, 1, 0) 
This shows that the vectors 
vy = (€=—3,1,0,0,0,0), wo=(=—4,0, =—2,1,0,0), vw3=(—2,0,0,0, 1,0) 


span the solution space. We leave it for you to check that these vectors are linearly independent 
by showing that none of them is a linear combination of the other two (but see the remark that 
follows). Thus, the solution space has dimension 3. 


Remark It can be shown that for a homogeneous linear system, the method of the last example always 
produces a basis for the solution space of the system. We omit the formal proof. 


Some Fundamental Theorems 


We will devote the remainder of this section to a series of theorems that reveal the subtle interrelationships 
among the concepts of linear independence, basis, and dimension. These theorems are not simply exercises in 
mathematical theory—they are essential to the understanding of vector spaces and the applications that build 
on them. 


We will start with a theorem (proved at the end of this section) that is concerned with the effect on linear 
independence and spanning if a vector is added to or removed from a given nonempty set of vectors. 
Informally stated, if you start with a linearly independent set S and adjoin to it a vector that is not a linear 
combination of those in S, then the enlarged set will still be linearly independent. Also, if you start with a set S 
of two or more vectors in which one of the vectors is a linear combination of the others, then that vector can be 
removed from S without affecting span(S) (Figure 4.5.1). 


The vector outside the plane Any of the vectors can Either of the collinear 

can be adjoined to the other be removed, and the vectors can be removed, 

two without affecting their remaining two will still and the remaining two 

linear independence. span the plane. will still span the plane. 
Figure 4.5.1 


THEOREM 4.5.3. Plus/Minus Theorem 


Let S be a nonempty set of vectors in a vector space V. 
(a) If Sis a linearly independent set, and if v is a vector in V that is outside of span(S}, then the set 
SU {w} that results by inserting v into S is still linearly independent. 


(b) If v is a vector in S that is expressible as a linear combination of other vectors in S, and if S— {wv} 
denotes the set obtained by removing v from S, then S— {wv} span the same space; that is, 


span(S) = span(S— {¥} ) 


EXAMPLE 5 Applying the Plus/Minus Theorem 
Show that p, = 1— x, p2=2— x2, and p3= x? are linearly independent vectors. 


Solution The set S= {p, pz} is linearly independent, since neither vector in S is a scalar 
multiple of the other. Since the vector B3 cannot be expressed as a linear combination of the 
vectors in S (why?), it can be adjoined to S to produce a linearly independent set 


S" = {p1, p2. p3}. 


In general, to show that a set of vectors {v1, ¥2, -.., ¥,} iS a basis for a vector space V, we must show that the 
vectors are linearly independent and span V. However, if we happen to know that V has dimension n (so that 
{¥ 1, ¥2, -.., ¥,} contains the right number of vectors for a basis), then it suffices to check either linear 


independence or spanning— the remaining condition will hold automatically. This is the content of the 
following theorem. 


THEOREM 4.5.4 


Let V be an n-dimensional vector space, and let S be a set in V with exactly n vectors. Then S is a basis 
for Vif and only if S spans V or S is linearly independent. 


Proof Assume that S has exactly n vectors and spans V. To prove that S is a basis, we must show that Sis a 
linearly independent set. But if this is not so, then some vector v in S is a linear combination of the remaining 
vectors. If we remove this vector from S, then it follows from Theorem 4.5.35 that the remaining set of » — | 
vectors still spans V. But this is impossible, since it follows from Theorem 4.5.2 that no set with fewer than n 
vectors can span an n-dimensional vector space. Thus S is linearly independent. 


Assume that S' has exactly vectors and is a linearly independent set. To prove that S is a basis, we must show 
that S spans V. But if this is not so, then there is some vector v in V that is not in span(S). If we insert this 
vector into S, then it follows from Theorem 4.5.3a that this set of x 4- 1 vectors is still linearly independent. 
But this is impossible, since Theorem 4.5.2a states that no set with more than n vectors in an n-dimensional 
vector space can be linearly independent. Thus S spans V. 


EXAMPLE 6 Bases by Inspection 


(a) By inspection, explain why v, = ( — 3, 7) and v2 = (5, 5) form a basis for R2. 


(b) By inspection, explain why vj = (2, 0, — 1), vz = (4, 0,7), and v3 = (= 1, 1,4) forma 
basis for 23. 


Solution 


(a) Since neither vector is a scalar multiple of the other, the two vectors form a linearly 
independent set in the two-dimensional space 2, and hence they form a basis by Theorem 
4.5.4. 

(b) The vectors ¥1 and ¥3 form a linearly independent set in the xz-plane (why?). The vector V3 
is outside of the xz-plane, so the set {v1, ¥2, ¥3} is also linearly independent. Since 2? is 


three-dimensional, Theorem 4.5.4 implies that {¥1, ¥2, v3} is a basis for R7. 


The next theorem (whose proof is deferred to the end of this section) reveals two important facts about the 
vectors in a finite-dimensional vector space V: 


1. Every spanning set for a subspace is either a basis for that subspace or has a basis as a subset. 


2. Every linearly independent set in a subspace is either a basis for that subspace or can be extended to a basis 
for it. 


THEOREM 4.5.5 


Let S be a finite set of vectors in a finite-dimensional vector space V. 


(a) If S spans V but is not a basis for V, then S can be reduced to a basis for V by removing appropriate 
vectors from S. 


(b) If Sis a linearly independent set that is not already a basis for V, then Scan be enlarged to a basis 
for V by inserting appropriate vectors into S. 


We conclude this section with a theorem that relates the dimension of a vector space to the dimensions of its 
subspaces. 


THEOREM 4.5.6 


If W is a subspace of a finite-dimensional vector space V, then: 
(a) Wiis finite-dimensional. 

(b) din(W) < dim(¥’). 

(c) W =F ifand only if dm(W) = dim(’). 


Proof (a) We will leave the proof of this part for the exercises. 


Proof (b) Part (a) shows that W is finite-dimensional, so it has a basis 


S= (wy , W3, -... Wm} 


Either S is also a basis for V or it is not. If so, then dim(”) = #, which means that dim() = dim(W’). Ifnot, 
then because S is a linearly independent set it can be enlarged to a basis for V by part (b) of Theorem 4.5.5. But 
this implies that dim(#) < dim(?), so we have shown that dim(#”) < dim(}) in all cases. 


Proof (c) Assume that dim(}¥”) = dim(?”) and that 


s= (wy, W2, --+ Wm} 


is a basis for W. If S is not also a basis for V, then being linearly independent S can be extended to a basis for V 
by part (6) of Theorem 4.5.5. But this would mean that dim(?") => dim(/#), which contradicts our hypothesis. 
Thus S must also be a basis for V, which means that dim(}#) = dim(’). 


Figure 4.5.2 illustrates the geometric relationship between the subspaces of 2? in order of increasing 


dimension. 


Line through the origin 
(L-dimensional) 


Plane through 
the origin 
(2-dimensional) 


Figure 4.5.2 


OPTIONAL 


We conclude this section with optional proofs of Theorem 4.5.2, Theorem 4.5.3, and Theorem 4.5.5. 


Proof of Theorem 4.5.2(a) Let S’ = {w1, W2, Wm } be any set of m vectors in V, where jy; = . We 


want to show that S’ is linearly dependent. Since S= {¥1, V3, ..., ¥,} 18 a basis, each W; can be expressed as a 
linear combination of the vectors in S, say 


Wy = @4j¥y $F AaVe+ 6 tt Hay Vy 


W2 = @12¥1 +a22V2 + * * + +@n2¥y (1) 
Wm = 21mV1 + 23mV2+ * * * + a@nm¥n 
To show that S’ is linearly dependent, we must find scalars £1, £2, ..., &y,, not all zero, such that 
yw + kywy -+ —. -+ pW = 0 (2) 


Using the equations in 1, we can rewrite 2 as 
(A1a11 + 2a12++ + + + +k m@im)¥1 
+(K1a21 +2422 + + + + + kme2m)¥2 


+ (Kian + kay? + Se + Km@nm)¥n = 0 
Thus, from the linear independence of S, the problem of proving that S” is a linearly dependent set reduces to 
showing there are scalars £1, £2, ..., &ym, not all zero, that satisfy 
211k, +ajgko+ + + +Aipky =0 


azyk, + anki + ws 4 42nkm =0 3) 


Aayik, + aygkg+ + + + aymkm = 0 


But 3 has more unknowns than equations, so the proof is complete since Theorem 1.2.2 guarantees the 
existence of nontrivial solutions. 


Proof of Theorem 4.5.2(b) Let S _ {1 W2, ---, Wm } be any set of m vectors in V, where j, < »%. We 
want to show that S’ does not span V. We will do this by showing that the assumption that S’ spans V leads to a 
contradiction of the linear independence of {¥1, v3, -... ¥,} - If S’ spans V, then every vector in V is a linear 
combination of the vectors in S’. In particular, each basis vector ¥; is a linear combination of the vectors in S’, 


say 


Vy SH ayywy Fe aaywe bt tt ay Wy 
V2 = aygwy  ag9W2 tt ayy, (4) 
Vy = @1yW] + 22ypW2 + * + + ayy Wm 
To obtain our contradiction, we will show that there are scalars 1, £3, ..., ky, not all zero, such that 
kyvy | k¥2 pecs st KyVy, = 0 (5) 


But 4 and 5 have the same form as | and 2 except that m and n are interchanged and the w’s and v’s are 
interchanged. Thus, the computations that led to 3 now yield 


411k, #aygkg2+ ° + + +ajyky,=0 
aniky +ag2k2+ °° + +agyk,=0 


This linear system has more unknowns than equations and hence has nontrivial solutions by Theorem 1.2.2. 


Proof of Theorem 4.5.3(a) Assume that S= {¥v1, ¥3, ..., vy} isa linearly independent set of vectors in V, 
and v is a vector in V outside of span(S). To show that S’ = {¥1, V2, --- Vy; vi is a linearly independent set, 


we must show that the only scalars that satisfy 


Kyiv, +kovg+ ++ +kyvy +hy4iv=0 (6) 


areky =kp= ++ + =k,=k,41 =0. But it must be true that k, 1 = 0 for otherwise we could solve 6 for v 
as a linear combination of v1, v3, ..., ¥y, contradicting the assumption that v is outside of span(S). Thus, 6 
simplifies to 


kivy +kov2+ - ++ +k,v,=0 (7) 


which, by the linear independence of {¥1, v3, ..., Vy} , implies that 
kyskg=-++ + =k,=0 


Proof Theorem 4.5.3(b) Assume that S= {v1, v3, .... Vy} is a set of vectors in V, and (to be specific) 
suppose that ¥> is a linear combination of v1, ¥3, -.., ¥»—1, Say 


Vy = CV] FCQV2 A * * A Cy_-1Vy—] (8) 


We want to show that if ¥» is removed from S, then the remaining set of vectors {v1, V2, -... Vy—1} still spans 
S; that is, we must show that every vector w in span(.S) is expressible as a linear combination of 
{¥1, V2, -.., Vy, } . But if w is in span(S), then w is expressible in the form 

w=kyvy +kovg+ + + + ky vyp-1 + 4,¥y 


or, on substituting 8, 


w=kyvy +kovat sts be Ap_yvyiy FA (Civ #cgvg + 6 8 * + Cy_1Vy_-1) 


which expresses w as a linear combination of v1, ¥3, -.., Vy—1. 


Proof of Theorem 4.5.5(a) IfS isa set of vectors that spans V but is not a basis for V, then S is a linearly 
dependent set. Thus some vector v in S is expressible as a linear combination of the other vectors in S. By the 
Plus/Minus Theorem (4.5.35), we can remove v from S, and the resulting set S’ will still span V. If S’ is linearly 
independent, then S’ is a basis for V, and we are done. If S" is linearly dependent, then we can remove some 
appropriate vector from S’ to produce a set S” that still spans V. We can continue removing vectors in this way 
until we finally arrive at a set of vectors in S that is linearly independent and spans V. This subset of S is a basis 
for V. 


Proof of Theorem 4.5.5(b) Suppose that dim(?”) = x. If S is a linearly independent set that is not already a 
basis for V, then S fails to span V, so there is some vector v in V that is not in span(S‘). By the Plus/Minus 
Theorem (4.5.3a), we can insert v into S, and the resulting set S’ will still be linearly independent. If S’ spans JV, 
then S’ is a basis for V, and we are finished. If S’ does not span V, then we can insert an appropriate vector into 
S' to produce a set S” that is still linearly independent. We can continue inserting vectors in this way until we 
reach a set with n linearly independent vectors in V. This set will be a basis for V by Theorem 4.5.4. 


Concept Review 
e Dimension 


e Relationships among the concepts of linear independence, basis, and dimension 


Skills 
e Find a basis for and the dimension of the solution space of a homogeneous linear system. 
e Use dimension to determine whether a set of vectors is a basis for a finite-dimensional vector space. 


e Extend a linearly independent set to a basis. 


Exercise Set 4.5 


In Exercises 1—6, find a basis for the solution space of the homogeneous linear system, and find the 
dimension of that space. 


1. Xp #x2— x3=0 
= 2x, —x2+ 2x3 =0 
—xX{ - x3=0 


Answer: 


Basis: (1, 0, 1); dimension = 1 


2. 3x, +x24+%34+%x4=0 
5x1 —%24+%3—%x4=0 

3. x1 —4x%94+3x3— x4=0 
2x1 — 8x2 + 6x3 — 2x4=0 


Answer: 


Basis: (4, 1, 0,0), (=—3,0, 1,0), (1,0, 0, 1); dimension = 3 
4, xX, —3x9+ x3=0 
2x1 — 6x2 + 2x3=0 
3x1 — 9x24 3x3=0 
5, 2x1 +x2+ 3x3=0 
X41 + 5x3=0 
x9+ x3=0 
Answer: 
No basis; dimension = 0 
6 x+ y+ z=0 
3x + 2y —2z=0 
4x+3y— z=0 
6x +5y+ z=0 
7. Find bases for the following subspaces of 22. 
(a) The plane 3x — 2y 4 5z=0. 
(b) The plane x —y = 0. 
(c) The line x = 24, y= —t,z=4¢. 
(d) All vectors of the form (a, 4, ¢), where = @ +c. 


Answer: 


@ (21,0), (-2,0,1 
a : 
(b) C1, 1, 0), (0, 0, 1) 
(c) (2, — 1,4) 
(d) (1, 1, 0), (0, 1, 1) 
8. Find the dimensions of the following subspaces of 24. 
(a) All vectors of the form (a, 4, ¢, 0). 
(b) All vectors of the form (a, 4, c,d), where @d =a +4 andg =g —b. 
(c) All vectors ofthe form (a, 6, c,@), where g =b=c=q. 
9. Find the dimension of each ofthe following vector spaces. 


(a) The vector space of all diagonal » sx », matrices. 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


(b) The vector space of all symmetric », s¢ 92 matrices. 


(c) The vector space of all upper triangular » x »; matrices. 


Answer: 


(a) ” 
(b) n(z+ 1) 
2 


(c) 2+ 1) 
2 


Find the dimension of the subspace of 3 consisting of all polynomials gy 4. @1x 4 agx? n aax? for which 
ag=0. 

(a) Show that the set W of all polynomials in 3 such that p( 1) = 0 is a subspace of P. 

(b) Make a conjecture about the dimension of W. 


(c) Confirm your conjecture by finding a basis for W. 

Find a standard basis vector for 27 that can be added to the set {v,, v2} to produce a basis for 27. 
(a) Vi=(-— 1,2, 3), v2= (1, —2, —2) 

(b) ¥i = (1, — 1,0), vo= (3, 1, —2) 


Find standard basis vectors for 24 that can be added to the set {v1, ¥2} to produce a basis for P+. 
vy=1, —4, 2, =—3), vo =(—3, 8, —4, 6) 


Answer: 


Any two of (0, 1, 0, 0), (0, 0, 1, 0), and (0, 0, 0, 1) can be used. 


Let {vj, ¥3, v3} bea basis for a vector space V. Show that {uy, uz, uz} is also a basis, where Uj = ¥1, 
U3 = Vj + V3, and uz = Vy + V2 + V3. 


The vectors ¥j = (1, —2, 3) and vz = (0, 5, — 3) are linearly independent. Enlarge {v1, ¥2} to a basis 
for R23. 


Answer: 


The vectors vj = (1, —2, 3, —5) andvz= (0, —1, 2, — 3) are linearly independent. Enlarge 

{¥1, ¥2} toa basis for p4. 

(a) Show that for every positive integer n, one can find », 4. 1 linearly independent vectors in #'{ — ca, 00) 
. [Hint: Look for polynomials. ] 

(b) Use the result inpart (a) to prove that #’( — 00, oo) is infinite- dimensional. 

(c) Prove that {= ©, 00}, ce (— co, 00}, and C"™'( — 09, 00) are infinite-dimensional vector spaces. 

Let S be a basis for an n-dimensional vector space V. Show that if v1, v2, ..., v» form a linearly 


independent set of vectors in V, then the coordinate vectors (v1) 5, (¥2) 5, -... (Wy) 5 form a linearly 
independent set in R”, and conversely. 


19. Using the notation from Exercise 18, show that if the vectors v1, v3, ..., ¥» span V, then the coordinate 
vectors (v1) 5, (¥2) 9, -... (¥,),9 span R”, and conversely. 


20. Find a basis for the subspace of 3 spanned by the given vectors. 
(a) —14x—2x7,34 3x + 6x2,9 
(b) 14x, x2, -2 + 2x7, 3x 
(©) 1x — 3x7, 2 + 2x — 6x7, 3+ 3x — 9x7 


[Hint: Let S be the standard basis for ?3, and work with the coordinate vectors relative to S as in Exercises 
18 and 19.] 


21. Prove: A subspace of a finite-dimensional vector space is finite-dimensional. 


22. State the two parts of Theorem 4.5.2 in contrapositive form. 
True-False Exercises 
In parts (a)-(j) determine whether the statement is true or false, and justify your answer. 
(a) The zero vector space has dimension zero. 
Answer: 


True 


(b) There is a set of 17 linearly independent vectors in 717. 


Answer: 


True 


(c) There is a set of 11 vectors that span p!?, 


Answer: 


False 


(d) Every linearly independent set of five vectors in > is a basis for 2°. 


Answer: 


True 


(e) Every set of five vectors that spans 2° is a basis for 2°. 


Answer: 


True 


(f) Every set of vectors that spans 2” contains a basis for 2”. 
Answer: 


True 


(g) Every linearly independent set of vectors in ” is contained in some basis for 2”. 
Answer: 


True 


(h) There is a basis for Af 33 consisting of invertible matrices. 
Answer: 
True 
(i) If A has size » x » and iy, A, A’, Ai, an are distinct matrices, then {im A, At tees an | is linearly 
dependent. 
Answer: 


True 


(j) There are at least two distinct three-dimensional subspaces of P3. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


4.6 Change of Basis 


A basis that is suitable for one problem may not be suitable for another, so it is a common process in the study 
of vector spaces to change from one basis to another. Because a basis is the vector space generalization of a 
coordinate system, changing bases is akin to changing coordinate axes in 22 and 2. In this section we will 


study problems related to change of basis. 


Coordinate Maps 


If S= {¥1, V2, -.., ¥,} 1s a basis for a finite-dimensional vector space V, and if 
Wye= (c1, C2; 77? Cy) 
is the coordinate vector of v relative to S, then, as observed in Section 4.4 , the mapping 


v— (v)5 (1) 


creates a connection (a one-to-one correspondence) between vectors in the general vector space V and vectors 
in the familiar vector space 2”. We call 1 the coordinate map from V to 8”. In this section we will find it 
convenient to express coordinate vectors in the matrix form 

cy 


[vls=|. (2) 


where the square brackets emphasize the matrix notation (Figure 4.6.1). 


Coordinate map 


Figure 4.6.1 


Change of Basis 


There are many applications in which it is necessary to work with more than one coordinate system. In such 
cases it becomes important to know how the coordinates of a fixed vector relative to each coordinate system 
are related. This leads to the following problem. 


The Change-of-Basis Problem 


If v is a vector in a finite-dimensional vector space V, and if we change the basis for V from a basis B 
to a basis B’, how are the coordinate vectors [wv] p and [v] pr? 


Remark To solve this problem, it will be convenient to refer to B as the “old basis” and B’ as the “new 
basis.” Thus, our objective is to find a relationship between the old and new coordinates of a fixed vector v in 
V, 


For simplicity, we will solve this problem for two-dimensional spaces. The solution for n-dimensional spaces 
is similar. Let 


B= fui, uz} and B= fui, uy} 


be the old and new bases, respectively. We will need the coordinate vectors for the new basis vectors relative 
to the old basis. Suppose they are 


Mle=[5] ¢ [4 ]-[2] a 
That is, 


u; =au, + du 


4 
u) =cu; +duz ® 
Now let v be any vector in V, and let 
ky 
[v]z'= e (5) 
be the new coordinate vector, so that 
v = kyu | kau) (6) 


In order to find the old coordinates of v, we must express v in terms of the old basis B. To do this, we 
substitute 4 into 6. This yields 


v= ,(au + Suz) + &2(cuy + dug) 
or 
v= (kya + koc)uy + (kh + kod)uz 
Thus, the old coordinate vector for v is 
Kya + koe 
a 


which, by using 5, can be written as 


le=[5 dfn |=[s a]te 


This equation states that the old coordinate vector [wv] p results when we multiply the new coordinate vector 
[v] p' on the left by the matrix 
a2 
P=[s al 


Since the columns of this matrix are the coordinates of the new basis vectors relative to the old basis [see 3] 
we have the following solution of the change-of-basis problem. 


Solution of the Change-of-Basis Problem 


If we change the basis for a vector space V from an old basis 8 = {uy, uz, -.., Uy,} to anew basis 
' roe r ' ; : 
B= {uy Up, ---, Uy i, then for each vector v in V, the old coordinate vector [v] p is related to the 


new coordinate vector [v] p by the equation 
[v] p=P[v] p (7) 


where the columns of P are the coordinate vectors of the new basis vectors relative to the old basis; 
that is, the column vectors of P are 


[t]e [le-- [es (8) 


Transition Matrices 


The matrix P in Equation 7 is called the transition matrix from 8' to B. For emphasis, we will often denote it 
by Pp'_, p It follows from 8 that this matrix can be expressed in terms of its column vectors as 


My i i 
Pp'e= [My | pl ]el* °° Iu (9) 
B B B 
Similarly, the transition matrix from B to 8’ can be expressed in terms of its column vectors as 


Pp_.e'=([lur)e'|[u2] ep}: * * |[un)e'] (10) 


Remark There is a simple way to remember both of these formulas using the terms “old basis” and “new 
basis” defined earlier in this section: In Formula 9 the old basis is 8’ and the new basis is B, whereas in 
Formula 10 the old basis is B and the new basis is 8’. Thus, both formulas can be restated as follows: 


The columns of the transition matrix from an old basis to a new basis are the coordinate vectors of the 
old basis relative to the new basis. 


EXAMPLE 1 Finding Transition Matrices 


Consider the bases 8 = {uj, uz} and B= fu; > uy} for R2, where 

a) = (1, 0}, u3 = (9, 1), u; = (1, 1), w) = (2, 1) 
(a) Find the transition matrix P p'_.p from B' to B. 
(b) Find the transition matrix Ppp from B to BY. 


Solution 


(a) Here the old basis vectors are uy and u) and the new basis vectors are Uj and Uz. We want 
to find the coordinate matrices of the old basis vectors ui, and ut relative to the new basis 
vectors Uj and U3. To do this, first we observe that 


u; =uj + U2 
u = 2u; + u2 


from which it follows that 


mJe-|1] = [4Je-[7| 


12 
Pa'.e=| ] 


(b) Here the old basis vectors are Uj and U3 and the new basis vectors are uy and u). As in part 


and hence that 


(a), we want to find the coordinate matrices of the old basis vectors uy and u) relative to 
the new basis vectors Uj and U3. To do this, observe that 


u= —u; ++ u) 
uz = 2u; —u, 


from which it follows that 


tuile'=|[~}] ad tmler=|_7) 


and hence that 


Suppose now that B and 8° are bases for a finite-dimensional vector space V. Since multiplication by Pp'_, p 
maps coordinate vectors relative to the basis B' into coordinate vectors relative to a basis B, and P B_.R' Maps 
coordinate vectors relative to B into coordinate vectors relative to 8’, it follows that for every vector v in V 
we have 


[v] p=Pe'_elv) a (11) 
[v] g'=Pp_e'lve (12) 


EXAMPLE 2 Computing Coordinate Vectors << 


Let B and 3" be the bases in Example 1. Use an appropriate formula to find [¥] p given that 


[v]g = ig 


Solution To find [wv] p we need to make the transition from 8" to B. It follows from Formula 
11 and part (a) of Example | that 


[v] ep=Pp'_elv]g = F i|l-3| = H 


Invertibility of Transition Matrices 


If B and 8" are bases for a finite-dimensional vector space V, then 

(Ppt_.p) Ppp) = Pag 
because multiplication by (P p'_,p)(Pp_, p') first maps B-coordinates of a vector into 8'-coordinates, and 
then maps those 8'-coordinates back into the original B-coordinates. Since the net effect of the two operations 


is to leave each coordinate vector unchanged, we are led to conclude that Pp. » must be the identity matrix, 
that is, 


(Ppp) Paieg) =! (13) 
(we omit the formal proof). For example, for the transition matrices obtained in Example | we have 


Psto)(Pe 2) =|; Ih elt |=? 


It follows from 13 that Pp'_, p is invertible and that its inverse is Pp.» Thus, we have the following 
theorem. 


THEOREM 4.6.1 


If P is the transition matrix from a basis 8’ to a basis B for a finite-dimensional vector space V, then P 
is invertible and P—! is the transition matrix from B to 3’. 


An Efficient Method for Computing Transition Matrices for R" 


Our next objective is to develop an efficient procedure for computing transition matrices between bases for 
R”. As illustrated in Example 1, the first step in computing a transition matrix is to express each new basis 
vector as a linear combination of the old basis vectors. For 2” this involves solving n linear systems of n 
equations in 7 unknowns, each of which has the same coefficient matrix (why?). An efficient way to do this is 
by the method illustrated in Example 2 of Section 1.6, which is as follows: 


A Procedure for Computing Ps _,. B’ 


Step 1 Form the matrix [2 ‘|8}. 


Step 2 Use elementary row operations to reduce the matrix in Step | to reduced row echelon form. 


Step 3 The resulting matrix will be [/|Pp_.p'] 
Step 4 Extract the matrix Pp. p: from the right side of the matrix in Step 3. 


This procedure is captured in the following diagram. 


: row operations - 
[new basis|old basis] — [J|transition from old to new] (14) 


EXAMPLE 3 Example 1 Revisited 


In Example 1 we considered the bases 8 = {uj, u2} and 8’ = fuy', u;'} for R2, where 
uj = (1, 0}, uz = (9, 1}, uy = (1, 1}, u’ = (2, 1) 
(a) Use Formula 14 to find the transition matrix from 8! to B. 


(b) Use Formula 14 to find the transition matrix from B to 8". 


Solution 


(a) Here 8" is the old basis and B is the new basis, so 


1 0 


basis|old basis] = 
[new asis|o asis | i 


L 2 
1 1 


Since the left side is already the identity matrix, no reduction is needed. We see by 
inspection that the transition matrix is 


<3 
Pa'.e=| ] 


which agrees with the result in Example 1. 
(b) Here B is the old basis and 8" is the new basis, so 


i-2 


[new basis|old basis] = r , 


1 0 
0 1 
By reducing this matrix, so the left side becomes the identity we obtain (verify) 


= 1 O/—1 2 
(Ufanstion Som otto new] =| 1 1 E| 


-~1 2 
Paa'=| 1 4] 


which also agrees with the result in Example 1. 


so the transition matrix is 


Transition to the Standard Basis for R" 


Note that in part (a) of the last example the column vectors of the matrix that made the transition from the 
basis 8" to the standard basis turned out to be the vectors in 8’ written in column form. This illustrates the 


following general result. 


THEOREM 4.6.2 


Let B’ = fui, er Uy } beany basis for the vector space R” and let S= {e4, e3,..., @,} be the 
standard basis for ®”. If the vectors in these bases are written in column form, then 


Pp'_.g= [ufug|* + + [Uy] (15) 


It follows from this theorem that if 
A= [ujfug|+ * + fun] 


is any invertible » 5 , matrix, then A can be viewed as the transition matrix from the basis {uyj, uz, --., Uy} 
for 2” to the standard basis for R”. Thus, for example, the matrix 


lt 23 
A=|2 5 3 
10 8 
which was shown to be invertible in Example 4 of Section 1.5, is the transition matrix from the basis 
aj =(1,2,1), wa=(2,5,0), w= (3, 3,8) 
to the basis 
e;=(1,0,0), eg=(0,1,0), e3 = (0,0, 1) 


Concept Review 
° Coordinate map 
° Change-of-basis problem 


e Transition matrix 


Skills 
e Find coordinate vectors relative to a given basis directly. 
e Find the transition matrix from one basis to another. 


¢ Use the transition matrix to compute coordinate vectors. 


Exercise Set 4.6 


1. Find the coordinate vector for w relative to the basis S= {uy, uz} for R2. 
(a) w= (1, 9), ug= (0, 1), w= G, 7) 
(b) 1 = (2, —4), ug= G3, 8), w= (1, 1) 
(cy) w=, 1), uz= (0, 2), w= (G, 4) 


Answer: 

(a) [w] s= 2, | 

(b) = 
[w] s= : 


nN 


im) 


> 


na 


a 
~ wle=| boa 
2 


. Find the coordinate vector for v relative to the basis S= {v1, v2, v3} for R7. 


(a) v= (2, —1, 3); ¥, =(1, 0, 0), v2 = (2, 2, 0), va = (3, 3, 3) 
(b) v= (5, — 12, 3); ¥; = (1, 2, 3), v2 =(—4, 5, 6), v3 = (7, —8, 9) 


. Find the coordinate vector for p relative to the basis S= {p1, p2, p3} for P2. 


(a) p=4—3x427; pj =1, pp=x,p3=x" 


(b) p=2—x+x4;p)=1+x,pp=1+x4,ppaxtx 
Answer: 


(a) + 
(p)g=(4, —3,1). [pl s=| -3 


(b) 0 
@P)s= (0,2, —1), [Ple=| 2 
=-1 

. Find the coordinate vector for A relative to the basis S= {.4;, Az, A3, Aq} for Af 9. 


2 0 =i 11 
ash a=[ ab #=L0 0} 


. Consider the coordinate vectors 


6 3 
[w]g=]-—1]. [a]s=]0], [4] s= 
4 


(a) Find w if S is the basis in Exercise 2(a). 
(b) Find q if S is the basis in Exercise 3(a). 
(c) Find B if Sis the basis in Exercise 4. 


Answer: 

(a) w= (16, 10, 12) 

(b) q=3+4x7 

(c) p= 15 —1 
6 3 


- Consider the bases 8= {uy, uz} and Bi = fui , uy} for R2, where 


w-[J} oe [f} =f} ¢-[ 


(a) Find the transition matrix from 8’ to B. 
(b) Find the transition matrix from B to 8". 


(c) Compute the coordinate vector [w] p, where 
w= c) 
=5 


(d) Check your work by computing [w] p’ directly. 


and use 10 to compute [w] p’. 


. Repeat the directions of Exercise 6 with the same vector w but with 


u=[3} m=[4) «=[3} 4=[71 


Answer: 
(a) |} 13 _1 
10 
2 
75 0 
(b) 2 
" 2 
13 
a) — 
(c) wit 
10 -4 
wae] S| me-(2] 
5 


- Consider the bases 8= {uj, uz, uz} and B= {uj > uy, u;} for R3, where 


—3 —3 1 

uj = 0}, w= 2|, w= 
=3 —1 —1 
=—6 —2 —2 
u; = =—6 |, uw) = —6 |, U3 = —3 
0 4 7 


(a) Find the transition matrix from B to 3°. 


(b) Compute the coordinate vector [w] p, where 


w= 8 


and use 12 to compute [w] p’. 


(c) Check your work by computing [w] p’ directly. 


9. Repeat the directions of Exercise 8 with the same vector w, but with 


2 2 1 
uj=|1], ug=]—-1], uwz=]2 
1 1 1 
3 1 —1 
u; = 1 uw) = 1], us = 
— —3 2 
Answer: 
(a) 2: 
3 2 5 
1 
—2 —3 =) 
3 1 6 
(b) wes 
9 2 
[w] p=] —9], [w]e =| 23 
5 2 
6 


10. Consider the bases B= {p, pz} and 8’ = fai. q2} for P; where 
pi=6+3x, po2=10+2x, qi=2, qg=3+2x 
(a) Find the transition matrix from 8" to B. 
(b) Find the transition matrix from B to 8’. 
(c) Compute the coordinate vector [p] p, where p= —4 + x, and use 12 to compute [p] pr. 
(d) Check your work by computing [p] p: directly. 
11. Let V be the space spanned by f ; = sin x andf3= cos x. 


(a) Show that g; = 2sin x + cos x and gj = 3cos x form a basis for V. 
(b) Find the transition matrix from 8’ = {g1. g2\ toB= {f1, £2}. 


(c) Find the transition matrix from B to 8’. 


(d) Compute the coordinate vector [h] p, where h = 2sin x — Scos x, and use 12 to obtain [h] pr. 


(e) Check your work by computing [h] p» directly. 


Answer: 


(b) [2 0 
Pg 


12. 


13. 


14. 


(c) 


a fo 


@ tmle=|_5 | thle =| 9 | 


Let S be the standard basis for R@, and let = {v, v2} be the basis in which vy; = (2, 1) and 
¥2= (=—3,4) 

(a) Find the transition matrix Pp.» by inspection. 

(b) Use Formula 14 to find the transition matrix Ps. p 

(c) Confirm that Pp_.> and Ps _,p are inverses of one another. 

(d) Let w= (5, —3) Find [w] p and then use Formula 11 to compute [w] > 

(ec) Let w= (3, —5) Find [w] » and then use Formula 12 to compute [w] p 


Let S be the standard basis for 27, and let = {v1, ¥2, ¥3} be the basis in which vj = (1, 2, 1), 
v2 = (2, 5, 0), and v3 = (3, 3, 8). 

(a) Find the transition matrix Pp. > by inspection. 

(b) Use Formula 14 to find the transition matrix P »_, p. 

(c) Confirm that Pp_.> and Ps _,p are inverses of one another. 

(d) Letw= (5, —3, 1). Find [w] p and then use Formula 11 to compute [w] 9». 

(e) Letw= (3, —5, 0). Find [w] 5 and then use Formula 12 to compute [w] p. 


Answer: 
ay i ee 
2.2573 
10 8 
(b) =—40 16 9 
13 =—5 =—3 
5 =—2 —1 
(d) —239 5 
[w] p= 77 |, [wl g=| -3 
30 1 
(e) 3 =200 
[w] s=| —5 |, [w] g= 64 
0 25 


Let By = {uy, uz} and 83 = {¥1, v2} be the bases for R2 in which 
u, = (2, 2), ug = (4, = 1), vy = (1, 3), andvz3= (=—1, = 1). 
(a) Use Formula 14 to find the transition matrix ?p5_,2). 

(b) Use Formula 14 to find the transition matrix Pz, ,25. 

(c) Confirm that ?z._,p) and /z,_,R, are inverses of one another. 


(d) Let w= (5, — 3). Find [w] g, and then use the matrix Pg, _,z. to compute [w] g, from [w] 3). 
(ce) Letw= (3, —5). Find [w] g, and then use the matrix Pz,_,7) to compute [w] p, from [w] p5. 


15. Let By = {uy, ug} and 83 = {¥1, v2} be the bases for R2 in which uy = (1, 2), uz = (2, 3), 


16. 


17. 


vy = (1, 3), and vz = (1, 4). 

(a) Use Formula 14 to find the transition matrix P B2—B}- 

(b) Use Formula 14 to find the transition matrix Pz, ,25. 

(c) Confirm that ?z,_,z, and ?p,,z, are inverses of one another. 

(d) Let w= (0, 1). Find [w] g, and then use the matrix Pg, ,R, to compute [w] g, from [w] 2). 
(ce) Let w= (2, 5). Find [w] g, and then use the matrix Pz._,p, to compute [w] g, from [w] z5. 


Answer: 


(a)| 3 
at a! 


Let By = {uy, uz, uz} and Bz = {¥1, ¥2, v3} be the bases for R? in which uj = ( — 3, 0, — 3), 
u2= (=—3, 2, =—1),u3= (1, 6, = 1), v7) = (—6, — 6,0), v2 = (—2, —6, 4), and 
¥3=(-2, —3,7). 


(a) Find the transition matrix ? 2, ,3>. 


(b) Letw= (—5, 8, —5). Find [w] g, and then use the transition matrix obtained in part (a) to 
compute [w] g. by matrix multiplication. 


(c) Check the result in part (b) by computing [w] p, directly. 


Follow the directions of Exercise 16 with the same vector w but with uj = (2, 1, 1), uz = (2, = 1, 1), 
u3 = (1, 2, 1), vy = (3, 1, = 5), vg = (1, 1, — 3), and v3 = (= 1, 0, 2). 


Answer: 


(a) 3 2 


18. 


19. 


20. 


21. 


22. 


23. 


9 
[w]e, =| —9}|. [wla,=]} 23 
aad Z 
6 


Let S= {e1, 2} be the standard basis for 22, and let B= {v1, v2} be the basis that results when the 
vectors in S are reflected about the line y = x. 


(a) Find the transition matrix Pp_. ». 

(b) Let P= Pp_,¢ and show that p? — Psp 

Let S= {e1, e2} be the standard basis for 22, and let B= {v1, v2} be the basis that results when the 
vectors in S are reflected about the line that makes an angle # with the positive x-axis. 

(a) Find the transition matrix Pp_, ». 


(b) Let P=Pp_,gand show that Pp? — Pop. 


Answer: 


(a) | cos 20 sin 26 
sin 28 = =—cos 24 


If 81, 82, and 83 are bases for 22, and if 


3] t 2 
Pa =( | and Pay85=| Ei 


then Pp,_,8,; =—_. 


If P is the transition matrix from a basis 3‘ to a basis B, and Q is the transition matrix from B to a basis C, 
what is the transition matrix from 8’ to C? What is the transition matrix from C to 8"? 


To write the coordinate vector for a vector, it is necessary to specify an order for the vectors in the basis. If 


P is the transition matrix from a basis 8" to a basis B, what is the effect on P if we reverse the order of 
vectors in B from ¥1, ..., Vy tO Vy, -.., ¥j? What is the effect on P if we reverse the order of vectors in 
both 8! and B? 


Consider the matrix 
11 0 
P=|1 0 2 
Be 
(a) P is the transition matrix from what basis B to the standard basis S= {e1, e2, e3} for R77 
(b) P is the transition matrix from the standard basis S= {e1, e3, e3} to what basis B for R3? 


Answer: 
(a) b= {(), 1,.0),01,0, 2). (0.215) 


o-{(§-4 8} (49) (24-9) 


24. The matrix 

0 0 
3 2 
Lt 1 
he). 030. 20 for R37 


25. Let B be a basis for R”. Prove that the vectors vj, v2, ..., Vj; form a linearly independent set in R” if and 
only if the vectors [v1] p, [¥2] p.- 


is the transition matrix from what basis B to the basis { 


[vj] p form a linearly independent set in 2”. 


26. Let B be a basis for R”. Prove that the vectors v1, v2, -.., ¥; span R” if and only if the vectors 
[vile [v2] p.-- [ve] g span R”. 


27. If [w] p =w holds for all vectors y in R”, what can you say about the basis B? 


True-False Exercises 


In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 
(a) If 8; and 83 are bases for a vector space V, then there exists a transition matrix from 3, to 3. 
Answer: 


True 


(b) Transition matrices are invertible. 
Answer: 


True 


(c) If B is a basis for a vector space R”, then Pp _, p is the identity matrix. 
Answer: 


True 


(d) If ? B,—+Bz 18 a diagonal matrix, then each vector in 83 is a scalar multiple of some vector in 3}. 
Answer: 


True 


(e) If each vector in 83 is a scalar multiple of some vector in 81, then P B,—>Bz is a diagonal matrix. 
Answer: 


False 


(f) If A is a square matrix, then A= Pg) _,7, for some bases Bj and 33 for R”. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


4.7 Row Space, Column Space, and Null Space 


In this section we will study some important vector spaces that are associated with matrices. Our work here will provide 
us with a deeper understanding of the relationships between the solutions of a linear system and properties of its 
coefficient matrix. 


Row Space, Column Space, and Null Space 


Recall that vectors can be written in comma-delimited form or in matrix form as either row vectors or column vectors. 
In this section we will use the latter two. 


DEFINITION 1 


For an j92 s¢ 92 Matrix 


@j1 @12 --- AI» 
A= 221 722 mone 42n 
&m1 @m2 --- Smn 
the vectors 
ry = [41 @12 --- @1y] 
r2 = [421 @22 --- @2y] 
Ty = [@m1 @m2 --- @mn] 


in 2” that are formed from the rows of A are called the row vectors of A, and the vectors 


a1] 212 aly 

431 422 a2n 
oo » |p ©2= 2 feo En = . 

aml m2 4mn 


in ®™ formed from the columns of A are called the column vectors of A. 


EXAMPLE 1 Rowand Column Vectors of a2 x 3 Matrix <«@ 


> 1 0 
a=|? = i 


r;=[2 1 O]andry=[3 -1 4] 


of} oLtp = ol 


Let 


The row vectors of A are 


and the column vectors of A are 


The following definition defines three important vector spaces associated with a matrix. 


DEFINITION 2 


If.A is an 2 x 2 matrix, then the subspace of 2” spanned by the row vectors of A is called the row space of A, 
and the subspace of 2™ spanned by the column vectors of A is called the column space of A. The solution space 
of the homogeneous system of equations 4x — (), which is a subspace of 8”, is called the null space of A. 


In this section and the next we will be concerned with two general questions: 


Question 1. What relationships exist among the solutions of a linear system 4x — h and the row space, column space, 
and null space of the coefficient matrix A? 


Question 2. What relationships exist among the row space, column space, and null space of a matrix? 


Starting with the first question, suppose that 


411 @12 --- Gly x1 
a a eae «| x 
ga |A 22 ~~ AT gag ga] 
Smt @m2 --- Smn xy 


It follows from Formula 10 of Section 1.3 that if, 3, ..., ¢,, denote the column vectors of A, then the product 4x can 
be expressed as a linear combination of these vectors with coefficients from x; that is, 


Ax = X10] £22024... + Xyly (1) 
Thus, a linear system, 4x — b, of m equations in n unknowns can be written as 
X10, +%X202 +...+XyCy =b (2) 


from which we conclude that 4, — h is consistent if and only if h is expressible as a linear combination of the column 
vectors of A. This yields the following theorem. 


THEOREM 4.7.1 


A system of linear equations 4x — h is consistent if and only if b is in the column space of A. 


EXAMPLE 2 AVectorhb inthe Column Space ofA << 


Let Ax = hb be the linear system 


—1 3 2][*1 1 
12 =—3}|/%2)/=]| -9 
2 1 =—2]||%3 -3 
Show that h is in the column space of A by expressing it as a linear combination of the column vectors of 
A, 


Solution Solving the system by Gaussian elimination yields (verify) 


xy=2, x9= <1, x3=3 
It follows from this and Formula 2 that 
=-1 3 2 1 
2} 1}—|2}/+3] —3]/=] -—9 
2 1 =—2 3 


Recall from Theorem 3.4.4 that the general solution of a consistent linear system 4x = h can be obtained by adding any 
specific solution of this system to the general solution of the corresponding homogeneous system 4x — 0. Keeping in 
mind that the null space of A is the same as the solution space of 4x = 0, we can rephrase that theorem in the following 
vector form. 


THEOREM 4.7.2 


If Xg is any solution of a consistent linear system 4x = h, and if S= {v, v2, .... Vy} is a basis for the null 
space of A, then every solution of 4, — h can be expressed in the form 


E=XQ CLV, HCQV] +... CEVA, (3) 


Conversely, for all choices of scalars c1, ¢3, ..., ¢%, the vector x in this formula is a solution of 4x = b. 


Equation 3 gives a formula for the general solution of Ax —}. The vector Xg in that formula is called a particular 
solution of Ax —h, and the remaining part of the formula is called the general solution of Ax — (). In words, this 
formula tells us that. 


The general solution of a consistent linear system can be expressed as the sum of a particular solution of that system 
and the general solution of the corresponding homogeneous system. 


Geometrically, the solution set of 4x = h can be viewed as the translation by xg of the solution space of 4x = 0) (Figure 
4.7.1). 


Solution set 
of Ax =b 


Solution space 
of Ax=0 


Figure 4.7.1 


EXAMPLE 3 General Solution of a Linear System Ax=b 


In the concluding subsection of Section 3.4 we compared solutions of the linear systems 


*1 *1 
13 =—2 02 QO} %2 0 13 =—2 02 = O]f%*2 
ee” in ee ep | fa NL, a ee i ie i | 
00 5 10 0 15))%*4 0 00 5 10 0 15})%4 
26 0 8 4 18)/4%5 0 26 0 8 4 18))45 6 
X6 *6 


and deduced that the general solution x of the nonhomogeneous system and the general solution Xj, of the 
corresponding homogeneous system (when written in column-vector form) are related by 


x4 —3r = 4s — 2t 0 =e = =| 
x2 . 0 1 0 0 
x3 oad eT Oly —2ta ao 
= & =|0 r +s - 

x4 ; 0 0 1 0 
x5 1 ; 0 0 1 
*6 = 3 0 0 0 
x = Xi 


Recall from the Remark following Example 4 of Section 4.5 that the vectors in Xj, form a basis for the solution space of 


Ax = 0. 


Bases for Row Spaces, Column Spaces, and Null Spaces 


We first developed elementary row operations for the purpose of solving linear systems, and we know from that work 
that performing an elementary row operation on an augmented matrix does not change the solution set of the 
corresponding linear system. It follows that applying an elementary row operation to a matrix A does not change the 
solution set of the corresponding linear system 4x — 0, or, stated another way, it does not change the null space of A. 
Thus we have the following theorem. 


THEOREM 4.7.3 


Elementary row operations do not change the null space of a matrix. 


The following theorem, whose proof is left as an exercise, is a companion to Theorem 4.7.3. 


THEOREM 4.7.4 


Elementary row operations do not change the row space of a matrix. 


Theorems 4.7.3 and 4.7.4 might tempt you into incorrectly believing that elementary row operations do not change the 
column space of a matrix. To see why this is not true, compare the matrices 


1 3 1 3 
A= and #= 
The matrix B can be obtained from A by adding —2 times the first row to the second. However, this operation has 
changed the column space of A, since that column space consists of all scalar multiples of 


[2 


whereas the column space of B consists of all scalar multiples of 


[0 


EXAMPLE 4 Finding a Basis for the Null Space of a Matrix 


and the two are different spaces. 


Find a basis for the null space of the matrix 


3 =—-2 02 90 
6 =—5 =2 4 =3 
0 5 10 0 15 
6 0 8 4 18 


A= 


MN OM 


Solution The null space of A is the solution space of the homogeneous linear system 4x — 0), which, as 
shown in Example 3, has the basis 


I 
bs 
| 
fs 
I 
ie 


1 0 0 
0 —2 0 
i=] gl Y2=! af Y3=l 
0 0 1 
0 0 0 


Remark Observe that the basis vectors ¥1, ¥2, and V3 in the last example are the vectors that result by successively 
setting one of the parameters in the general solution equal to 1 and the others equal to 0. 


The following theorem makes it possible to find bases for the row and column spaces of a matrix in row echelon form 


by inspection. 


THEOREM 4.7.5 


If a matrix R is in row echelon form, then the row vectors with the leading 1's (the nonzero row vectors) form a 
basis for the row space of R, and the column vectors with the leading 1's of the row vectors form a basis for the 
column space of R. 


The proof involves little more than an analysis of the positions of the 0's and 1's of R. We omit the details. 


EXAMPLE 5 Bases for Row and Column Spaces << 


The matrix 
1 =—3: 4 0.3 
0 1300 
R= 15 0010 
0 oood 


is in row echelon form. From Theorem 4.7.5, the vectors 
ry =[1 —2 5 0 3] 
rz; =[0 13 0 0) 
r3 =(0 00 1 0] 


form a basis for the row space of R, and the vectors 


or OO 


form a basis for the column space of R. 


EXAMPLE 6 Basis for a Row Space by Row Reduction <@ 


Find a basis for the row space of the matrix 


= 2a 2 a5 


Solution Since elementary row operations do not change the row space of a matrix, we can find a basis 
for the row space of A by finding a basis for the row space of any row echelon form of A. Reducing A to 
row echelon form, we obtain (verify) 


—3 4-2 45 4 
O01. 3 —2 <6 
00 0 1 5 
0 00 0 0 0 


oo 


By Theorem 4.7.5, the nonzero row vectors of R form a basis for the row space of R and hence form a 
basis for the row space of A. These basis vectors are 


rm = [1 -3 4-2 5 4] 
m= (0 01 3 <2 6] 
rr = (0 00 0 1 5] 


The problem of finding a basis for the column space of a matrix A in Example 6 is complicated by the fact that an 
elementary row operation can alter its column space. However, the good news is that elementary row operations do not 
alter dependence relationships among the column vectors. To make this more precise, suppose that wy), w3, ..., Wy, are 
linearly dependent column vectors of A, so there are scalars ¢1, ¢3, ..., ¢j, that are not all zero and such that 


cywy + cgw2 +... + cyw;, = 9 (4) 


If we perform an elementary row operation on A, then these vectors will be changed into new column vectors 
WW), --, Wi. At first glance it would seem possible that the transformed vectors might be linearly independent. 
However, this is not so, since it can be proved that these new column vectors will be linear dependent and, in fact, 
related by an equation 


cw, + cqw, +...4+c,w, =0 


that has exactly the same coefficients as 4. It follows from the fact that elementary row operations are reversible that 
they also preserve linear independence among column vectors (why?). The following theorem summarizes all of these 
results. 


THEOREM 4.7.6 


If A and B are row equivalent matrices, then: 


(a) A given set of column vectors of A is linearly independent if and only if the corresponding column vectors 
of B are linearly independent. 


(b) A given set of column vectors of A forms a basis for the column space of A if and only if the corresponding 
column vectors of B form a basis for the column space of B. 


EXAMPLE 7 Basis for a Column Space by Row Reduction 


Find a basis for the column space of the matrix 


ae ae a 


Solution We observed in Example 6 that the matrix 
1-3 4 -2 5 4 
Oo O11 3 =—2 =6 
0 00 0 1 5 
0 00 0 0 20 


R= 


is arow echelon form of A. Keeping in mind that A and R can have different column spaces, we cannot 
find a basis for the column space of A directly from the column vectors of R. However, it follows from 
Theorem 4.7.65 that if we can find a set of column vectors of R that forms a basis for the column space of 
R, then the corresponding column vectors of A will form a basis for the column space of A. 


Since the first, third, and fifth columns of R contain the leading 1's of the row vectors, the vectors 


1 4 5 
, _|9 ,_ | 1 ,_|—2 
c = 0 7 C3 = 0 fi Cs = 1 
0 0 0 
form a basis for the column space of R. Thus, the corresponding column vectors of A, which are 
1 4 5 
a 2 — 9 — 8 
| 2 ? 3 9 ? 5 9 
=-1 4 =5 


form a basis for the column space of A. 


Up to now we have focused on methods for finding bases associated with matrices. Those methods can readily be 
adapted to the more general problem of finding a basis for the space spanned by a set of vectors in R”. 


EXAMPLE 8 Basis for a Vector Space Using Row Operations 


Find a basis for the subspace of 2° spanned by the vectors 
vj = (1, —2,0,0,3), we = (2, =—5, —3, —2, 6), 
v3 = (0,5,15,10,0), wq = (2,6, 18,8, 6) 


Solution The space spanned by these vectors is the row space of the matrix 


1=—-2 0 QO 3 
2=5 =3 =2 6 
0 5 15 10 0 
2 6 18 8 6 


Reducing this matrix to row echelon form, we obtain 


1 =—2 0 0 3 
0 L320 
0 0110 
0 000 0 


The nonzero row vectors in this matrix are 

w, = (1, —2,0,0,3), wo=(0,1,3,2,0), wz=—(0,0,1, 1,0) 
These vectors form a basis for the row space and consequently form a basis for the subspace of 2° 
spanned by V1, ¥2, ¥3, and ¥4. 


Bases Formed from Row and Column Vectors of a Matrix 


In all of the examples we have considered thus far we have looked for bases in which no restrictions were imposed on 
the individual vectors in the basis. We now want to focus on the problem of finding a basis for the row space of a matrix 
A consisting entirely of row vectors from A and a basis for the column space of A consisting entirely of column vectors 
of A. 


Looking back on our earlier work, we see that the procedure followed in Example 7 did, in fact, produce a basis for the 
column space of A consisting of column vectors of A, whereas the procedure used in Example 6 produced a basis for the 
row space of A, but that basis did not consist of row vectors of A. The following example shows how to adapt the 
procedure from Example 7 to find a basis for the row space of a matrix that is formed from its row vectors. 


EXAMPLE 9 Basis for the Row Space of a Matrix 


Find a basis for the row space of 


consisting entirely of row vectors from A. 


Solution We will transpose A, thereby converting the row space of A into the column space of ,47; then 
we will use the method of Example 7 to find a basis for the column space of 47; and then we will 
transpose again to convert column vectors back to row vectors. Transposing A yields 


—2 -5 5 6 
AT=| 0 -3 15 18 
0-2 10 8 
3 6 0 6 
Reducing this matrix to row echelon form yields 
12 O 2 
01 +5 —10 
00 ©6060 1 
00 ©6060 0 
00 60 0 


The first, second, and fourth columns contain the leading 1's, so the corresponding column vectors in 47 
form a basis for the column space of 47; these are 


1 2 2 

—2 a) 6 

cy=|] O c2=] —3 and c4=| 18 
0 —2 8 

3 6 6 


Transposing again and adjusting the notation appropriately yields the basis vectors 
ry=[(1 -—-2 0 0 3), re=[(2 -—5 —3 —2 6], 
and 
rga=[2 6 18 8 6] 
for the row space of A. 


Next, we will give an example that adapts the methods we have developed above to solve the following general 
problem in 2”: 


PROBLEM 


Given a set of vectors S= {vj, V2, .... ¥;} in R”, find a subset of these vectors that forms a basis for span (S), 
and express those vectors that are not in that basis as a linear combination of the basis vectors. 


EXAMPLE 10 Basis and Linear Combinations 


(a) Find a subset of the vectors 
vy =(1, —2,0,3), we=(2, —5, —3, 6), 
v3=(0,1,3,0), vwg=(2, —1,4, =7), vw5=(5, —8, 1, 2) 
that forms a basis for the space spanned by these vectors. 


(b) Express each vector not in the basis as a linear combination of the basis vectors. 


Solution 


(a) We begin by constructing a matrix that has v1, v3, ..., v5 as its column vectors: 


1 20 2 5 
—2 -5 1 =-1 =-8 
0-3 3 4 1 
¢ 60 <7 2 () 
ro. t 4 
Vi ¥2 ¥3 V4 V5 
The first part of our problem can be solved by finding a basis for the column space of this matrix. 


Reducing the matrix to reduced row echelon form and denoting the column vectors of the resulting 
matrix by W1, W2, W3, W4, and W5 yields 


ore eH 


(6) 
‘ttt? 
W, W2 W3 W4 W5 
The leading 1’s occur in columns 1, 2, and 4, so by Theorem 4.7.5, 
(wi, W2, W4} 
is a basis for the column space of 6, and consequently, 
(v1, ¥2, ¥4} 
is a basis for the column space of 5. 
(b 


— 


We will start by expressing W3 and W5 as linear combinations of the basis vectors W1, W2, W4. The 
simplest way of doing this is to express Wz and 5 in terms of basis vectors with smaller subscripts. 
Accordingly, we will express #3 as a linear combination of W1 and 32, and we will express W5 as a 
linear combination of W1, #73, and W4. By inspection of 6, these linear combinations are 


wz = ew, -W2 
W5 => WPwot wy 
We call these the dependency equations. The corresponding relationships in 5 are 
v3 = wy-v 
V5 = VWyrRV2+ V4 


The following is a summary of the steps that we followed in our last example to solve the problem posed above. 
Basis for Span(S) 

Step 1. Form the matrix A having vectors inS= {v1, V3, ..., Vj} as column vectors. 

Step 2. Reduce the matrix A to reduced row echelon form R. 

Step 3. Denote the column vectors of R by w1, W3, -.., Wi. 


Step 4. Identify the columns of R that contain the leading 1's. The corresponding column vectors of A form a basis for 
span(S). 
This completes the first part of the problem. 


Step 5. Obtain a set of dependency equations by expressing each column vector of R that does not contain a leading | 
as a linear combination of preceding column vectors that do contain leading 1's. 


Step 6. Replace the column vectors of R that appear in the dependency equations by the corresponding column vectors 
of A. 
This completes the second part of the problem. 


Concept Review 


Row vectors 


Column vectors 


Row space 


Column space 


Null space 


General solution 


Particular solution 


Relationships among linear systems and row spaces, column spaces, and null spaces 


Relationships among the row space, column space, and null space of a matrix 


Dependency equations 


Skills 


e Determine whether a given vector is in the column space of a matrix; if it is, express it as a linear 
combination of the column vectors of the matrix. 


e Find a basis for the null space of a matrix. 
e Find a basis for the row space of a matrix. 
e Find a basis for the column space of a matrix. 


e Find a basis for the span of a set of vectors in 2”. 


Exercise Set 4.7 


1. List the row vectors and column vectors of the matrix 


2—-10 1 
3 5 7 =1 
1 42 7 


Answer: 


r1=(2, —1,0,1), r2=(3,5,7, —1), re=(1,4,2,7); 


2 —1 0 1 
cjy=]3)|, co=] 5], c3=]7], c4=] —1 
1 4 2 7 


2. Express the product 4x as a linear combination of the column vectors of A. 


Ge 


. Determine whether h is in the column space of A, and if so, express as a linear combination of the column vectors 
of A. 


(b) ese -1 
A=|101|; b=| 0 
$13 2 

(c) 1-11 5 
A=|9 31), b=| 1 
L224 = 
(d) ns | 2 
A= 1-1]; b=|o 
Sl-si) 0 

(c) 1201 4 
aly ae’ as Fo Be 

Ae ig 4 alr = S 
it 2.2 7 


Answer: 


° Talell-Ls 


(b) bis not in the column space of A. 


(c) [1 =i] fi 5 
9/—3| 3]+l1{=| 1 
1 1 At isan 
(d) [2 1 -1 1 
of=| 1/4(—-1)] 1/42) -1 
o} |—1 -1 1 
(e) [4 1 2 0 1 
3 0 1 2 1 
5 (=~ 26, |+13)5/-715 | +4], 
7 0 1 2 2 


4. Suppose that xy = — 1,x3 = 2,x3=4,x4= —3 is a solution of a nonhomogeneous linear system 4x = h and that 
the solution set of the homogeneous system 4x — Q is given by the formulas 
xy=—3r+4s, x3=>r—s, x3=Fr, X4=8 


(a) Find a vector form of the general solution of 4x —0Q. 


(b) Find a vector form of the general solution of 4x = bh. 


5. In parts (a)-(d), find the vector form of the general solution of the given linear system 4x — h; then use that result to 
find the vector form of the general solution of 4x — 0). 
(a) *1—3x2=1 

2x, —6x2=2 
(b) *1+%2+2x%3 = 5 
x1 + x3=—2 
2x, +x%24+3x3 = 3 
(c) *1—2xg+ x3+2x4= —1 
2x, —4x9 4+ 2x3 +4x4= —2 
=x, +2x23—- x3-2x%4 = 1 
3x, — 6x2 + 3x34 6x4= —3 
(d) xX, + 2x2—3x3+ x4= 4 
=—2x,-+ x2+2x3+ x4= —1 
=x, +3x2— x34+2x4 = 3 


4x, — 7x3 =—5x4= -—5 
Answer: 
(a) | 1 t 3]. t 3 
lo|* if ‘i 
(b) | —2 -1 -1 
Fil +z] —1]; ¢) =1 
0 1 1 
(c) | =1 2 —1 —2 2 —1 —2 
0 1 0 0 1 0 0 
ol lol tle oP lal eto al el 
0 0 0 1 0 0 1 
(d) | 6 ¥d i. v8 Hy 
5 5 5 5 5 
fa 4 3}. 14 a 
0 1 0 1 0 
0 0 1 0 1 


6. Find a basis for the null space of A. 
(a) 1 =-1 3 
A=|5 —4 —4 
7 =-6 2 
(b) 20 =1 
A=/4 
0 


(c) 1 432 
A=| 2130 
oe Me ee 
(d) Tse Sg Ss 
Sie Rel 
a) ee ees ee 
a a. ey ae 
(e) 1-3 2 2 #1 
Oy se Os 
AS BaF ao 4 
eee ae a 
= ee ee 


7. In each part, a matrix in row echelon form is given. By inspection, find bases for the row and column spaces of A. 


(a) |1 0 2 
00 1 
00 0 

(b) |} 1 =—3 0 0 
0 100 
0 00 0 
0 00 0 

(c) |1 2 4 5 
01 3 0 
0 0 1 =3 
0 0 0 1 
0.0 0 0 

(d)}1 2 =1 5 
0 1 4 3 
0 0 1+? 
0 0 0 1 

Answer: 

(a) 


1 
r)=[102), m=[001], =o], a= 
0 


ore NW 


(b) 1 3 
0 1 

r1=[1-300], 2=[0100], =|}, e2=| 
0 0 
00 


(c) ry =[1245], rp=[01 —30], rg=[001 —3], rg= [0001], 


a 

= 

I 
oo oc = 

a 

bh 

I 
oo or NM 

a 

es) 

I 


(d) ry = [12 —15], r2= [0143], r3=[001 -7], rg= [0001] 


i 2 =-1 5 
ot OY eae Nee sd 
0 0 1 —7 
0) i) 0 1 


8. For the matrices in Exercise 6, find a basis for the row space of A by reducing the matrix to row echelon form. 


9. By inspection, find a basis for the row space and a basis for the column space of each matrix. 


(a) |1 0 2 


001 
00 0 
(b)}1 —3 0 0 
0 10 0 
0 00 0 
0 00 0 
(c) |}1 2 4 D) 
01 —3 0 
0.0 1 =3 
0 0 0 1 
0 0 0 0 
(d)}1 2 =1 5 
0 1 4 3 
0 0 1-7 
0 0 0 1 
Answer: 
(a) 


1 2 
rp=[1 0 2]; re=[0 0 1]; cy =]0); cg9=] 1 
0 0 


(b) 
ry=[1 —3 0 Ol]; rg=[0 1 0 OJ]; c= 


(c) m=[1 2 4 5]; r2=[0 1 -—3 0); =[ 


| = 
0 


1 4 


rg=[0 0 0 1]; = 


oo Co 0 
wm oo or 


(dd) rm=[1 2 =-1 5]),;rm=[ 


1 2 —1 5 
0 1 4 3 
rg=[0 0 0 1); y= of =lol B=] 1h =] 5 
0 0 0 1 


10. For the matrices in Exercise 6, find a basis for the row space of A consisting entirely of row vectors of A. 
11. Find a basis for the subspace of 24 spanned by the given vectors. 
(a) (1,1, —4, —3), (2,0, 2, —2), (2, —1, 3, 2) 


(b) (—1, 1, —2, 0), (3, 3, 6, 0), (9, 0, 0, 3) 
() U1 10,0), 00,0 1, 1s C= 2,0, 2, 2), (0, = 3,0;.3) 


Answer: 
CYP: 4, owl 3). (OT, awh, oe BY: (0.0.1, -3) 
(6) (1, —1,2,0), (0, 1,0, 0), (0, ae 7?) 


(c) (1, 1, 0, 0), (0, 1, 1, 1), (0, 0, 1, 1), (0, 0, 0, 1) 


12. Find a subset of the vectors that forms a basis for the space spanned by the vectors; then express each vector that is 
not in the basis as a linear combination of the basis vectors. 
(a) 41 = (1,0, 1,1), va =(—3, 3,7, 1), vg3=(— 1, 3, 9, 3), va = (—5, 3,5, — 1) 
(b) Vi = (1, — 2,0, 3), va= (2, —4, 0, 6), vg=(—1, 1, 2,0), vg= (0, —1, 2, 3) 
(c) V= C1, — 1,5, 2), va=(—2, 3, 1,0), v3= (4, — 35, 9,4), v4= (0,4, 2, — 3), v5 = (—7, 18, 2, — 8) 


13. Prove that the row vectors of an » sx » invertible matrix A form a basis for 2”. 


14. Construct a matrix whose null space consists of all linear combinations of the vectors 


1 2 
vj= - and v2= S 
2 4 
15. (a) Let 
010 
A=/1 0 0 
000 


Show that relative to an xyz-coordinate system in 3-space the null space of A consists of all points on the z-axis 
and that the column space consists of all points in the xy-plane (see the accompanying figure). 


(b) Find a 3 x» 3 matrix whose null space is the x-axis and whose column space is the yz-plane. 


Null space of A 


Column space 
of A 


Figure Ex-15 


16. Find a 3 x 3 matrix whose null space is 
(a) a point. 
(b) a line. 
(c) aplane. 
17. (a) Find all 2 % 2 matrices whose null space is the line 3x —5y =0. 
(b) Sketch the null spaces of the following matrices: 


ala sh Lo 5} 
e-[3 i} 2=[0 0 


Answer: 


(a) be a for all real numbers a, b not both 0. 


(b) Since 4 and B are invertible, their null spaces are the origin. The null space of C is the line 3x 4. y = 0. The null 
space of D is the entire xy-plane. 


18. The equation x; ++ x3 ++ x3 = 1 can be viewed as a linear system of one equation in three unknowns. Express its 
general solution as a particular solution plus the general solution of the corresponding homogeneous system. 
[Suggestion: Write the vectors in column form. ] 


19. Suppose that A and B are » x »% matrices and A is invertible.Invent and prove a theorem that describes how the row 
spaces of 4 and B are related. 


True-False Exercises 

In parts (a)-(j) determine whether the statement is true or false, and justify your answer. 

(a) The span of ¥1, ..., ¥,, is the column space of the matrix whose column vectors are ¥1, -.., Vy. 
Answer: 


True 


(b) The column space of a matrix A is the set of solutions of 4x = b. 
Answer: 


False 


(c) If R is the reduced row echelon form of A, then those column vectors of R that contain the leading 1's form a basis for 
the column space of A. 


Answer: 


False 


(d) The set of nonzero row vectors of a matrix A is a basis for the row space of A. 
Answer: 


False 


(e) If A and B are » x » matrices that have the same row space, then A and B have the same column space. 
Answer: 


False 


(f) If E is an p92 x 3; elementary matrix and A is an j9) x 3, matrix, then the null space of £ A is the same as the null space 
of A. 


Answer: 


True 


If E is an jy » elementary matrix and A is an j; x % matrix, then the row space of E A is the same as the row space 
& RX 1 ry Xx p 
of A. 


Answer: 


True 


(h) If F is an p92 s« 32 elementary matrix and A is an j; sx »; matrix, then the column space of E A is the same as the column 
space of A. 


Answer: 


False 


(i) The system 4x — h is inconsistent if and only if h is not in the column space of A. 
Answer: 


True 


(j) There is an invertible matrix A and a singular matrix B such that the row spaces of A and B are the same. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


4.8 Rank, Nullity, and the Fundamental Matrix 
Spaces 


In the last section we investigated relationships between a system of linear equations and the row space, column 
space, and null space of its coefficient matrix. In this section we will be concerned with the dimensions of those 
spaces. The results weobtain will provide a deeper insight into the relationship between a linear system and its 
coefficient matrix. 


Row and Column Spaces Have Equal Dimensions 


In Examples 6 and 7 of Section 4.7 we found that the row and column spaces of the matrix 


i 424.9 6 4 
2-6 9 -1 8 2 
2-6 9-1 9 7 
= ee ee 


both have three basis vectors and hence are both three-dimensional. The fact that these spaces have the same 
dimension is not accidental, but rather a consequence of the following theorem. 


THEOREM 4.8.1 


The row space and column space of a matrix A have the same dimension. 


Proof Let R be any row echelon form of A. It follows from Theorem 4.7.4 and Theorem 4.7.6 b that 


dim(row space of A) = dim(row space of R) 
dim(column space of A) = dim(column space of 2) 


so it suffices to show that the row and column spaces of R have the same dimension. But the dimension of the row 
space of R is the number of nonzero rows, and by Theorem 4.7.5 the dimension of the column space of R is the 
number of leading 1's. Since these two numbers are the same, the row and column space have the same dimension. 


Rank and Nullity 


The dimensions of the row space, column space, and null space of a matrix are such important numbers that there is 
some notation and terminology associated with them. 


DEFINITION 1 


The common dimension of the row space and column space of a matrix A is called the rank of A and is 
denoted by rank(A); the dimension of the null space of A is called the nullity of A and is denoted by 
nullity(A). 


The proof of Theorem 4.8.1 shows that the rank 
of A can be interpreted as the number of leading 
1's in any row echelon form of A. 


EXAMPLE 1 Rankand Nullity of a4 x 6 Matrix << 


Find the rank and nullity of the matrix 


—| 20 4 o3 
3-72 0 1 4 
2-52 4 6 1 
4-92 -4 4 7 


Solution The reduced row echelon form of A is 
10 —4 =—28 —37 13 
O01 —2 —12 —16 5 ‘ 
00 0 0 0 0 (1) 
00 O 0 0 60 


(verify). Since this matrix has two leading 1's, its row and column spaces are two-dimensional and 
rank (A) = 2. To find the nullity of A, we must find the dimension of the solution space of the linear 
system 4x — (). This system can be solved by reducing its augmented matrix to reduced row echelon 
form. The resulting matrix will be identical to 1, except that it will have an additional last column of 
zeros, and hence the corresponding system of equations will be 

xy = 4x3 = 28x4 — 3ixs + 13x, = O 

x2 = 2x3 = 12x4 = 16x5 4 5xg = O 


Solving these equations for the leading variables yields 


xy = 4x3 + 28x4 + Six5 = 13x6 
x2 = 2x3 + 12xq4 + 16x5 = 5x6 2) 
from which we obtain the general solution 
xy = 4r+ 2854+ 37t — 13% 
x2 = 2r+ 125+ 16¢— 5x 
x3>> RP 
x4 = 5 
x5 = £ 
xi = & 


or in column vector form 


x1 4 28 37 ai% 
x2 2 12 16 —5 
So lee ag] Peg tee (3) 
x4 0 1 0 0 
x5 0 0 1 0 
X6 0 0 0 1 


Because the four vectors on the right side of 3 form a basis for the solution space, nullity(A) = 4. 


EXAMPLE 2 Maximum Value forRank 


What is the maximum possible rank of an }2 x », matrix A that is not square? 


Solution Since the row vectors of A lie in R” and the column vectors in 8”, the row space of A is 
at most n-dimensional and the column space is at most m-dimensional. Since the rank of A is the 
common dimension of its row and column space, it follows that the rank is at most the smaller of m 
and n. We denote this by writing 


rank (A) < muin(+2, 2) 


in which min (92, #) is the minimum of m and n. 


The following theorem establishes an important relationship between the rank and nullity of a matrix. 


THEOREM 4.8.2 Dimension Theorem for Matrices 


If A is a matrix with n columns, then 


rank (A) + nullity(4) =» (4) 


Proof Since A has n columns, the homogeneous linear system 4x — (0) has n unknowns (variables). These fall into 
two distinct categories: the leading variables and the free variables. Thus, 


variables variables 


= | ie al = 


But the number of leading variables is the same as the number of leading 1's in the reduced row echelon form of A, 
which is the rank of A; and the number of free variables is the same as the number of parameters in the general 
solution of 4x = 0, which is the nullity of A. This yields Formula 4. 


EXAMPLE 3 The Sum of Rank and Nullity 


The matrix 


—|§ 20° 4 5 a3 
7—7§ 2 0 4 4 
iso 4 
4 =9 2 <4 -4 7 


has 6 columns, so 
rank (A) + nullity(.4) = 6 
This is consistent with Example 1, where we showed that 
rank(4)}=2 and nullty(4) =4 


The following theorem, which summarizes results already obtained, interprets rank and nullity in the context of a 
homogeneous linear system. 


THEOREM 4.8.3 


If A is an p92 s¢ 92 matrix, then 


(a) rank(CA) = the number of leading variables in the general solution of Ax = 0. 
(b) nullity(.A) =the number of parameters in the general solution of Ax = 0 


EXAMPLE 4 Number of Parameters ina General Solution 
Find the number of parameters in the general solution of 4x — 0 if A is a 5 % 7 matrix of rank 3. 


Solution From 4, 
nullity(.A) = x = rank(A) =7-3=4 


Thus there are four parameters. 


Equivalence Theorem 


In Theorem 2.3.8 we listed seven results that are equivalent to the invertibility of a square matrix A. We are now in 
a position to add eight more results to that list to produce a single theorem that summarizes most of the topics we 
have covered thus far. 


THEOREM 4.8.4 Equivalent Statements 


If A is an 2 s¢ » matrix, then the following statements are equivalent. 
(a) Ais invertible. 

(b) Ax —0 has only the trivial solution. 

(c) The reduced row echelon form of A is /,,. 

(d) Ais expressible as a product of elementary matrices. 

(e) Ax —h is consistent for every » x | matrix h. 

() Ax=b has exactly one solution for every » 5 | matrix h. 
(g) det(A) #0. 

(h) The column vectors of A are linearly independent. 

(i) The row vectors of A are linearly independent. 

(j) The column vectors of A span 2”. 

(k) The row vectors of A span 2”. 

(l) The column vectors of A form a basis for 2”. 

(m) The row vectors of A form a basis for 2”. 

(n) A has rank n. 

(o) A has nullity 0. 


Proof The equivalence of (#:) through (+2) follows from Theorem 4.5.4 (we omit the details). To complete the 
proof we will show that (4), (22), and (@) are equivalent by proving the chain of implications 


(2) = (0) => (#) > (©). 


(b) = (0) If Ax = 0 has only the trivial solution, then there are no parameters in that solution, so nullity (4) = 0 
by Theorem 4.8.3 b. 


(o) =» (2) Theorem 4.8.2. 


(%) =» (b) IfA has rank n, then Theorem 4.8.3a implies that there are n leading variables (hence no free variables) 
in the general solution of 4x — (). This leaves the trivial solution as the only possibility. 


Overdetermined and Underdetermined Systems 


In many applications the equations in a linear system correspond to physical constraints or conditions that must be 
satisfied. In general, the most desirable systems are those that have the same number of constraints as unknowns, 
since such systems often have a unique solution. Unfortunately, it is not always possible to match the number of 
constraints and unknowns, so researchers are often faced with linear systems that have more constraints than 
unknowns, called overdetermined systems, or with fewer constraints than unknowns, called underdetermined 
systems. The following two theorems will help us to analyze both overdetermined and underdetermined systems. 


In engineering and other applications, the 
occurrence of an overdetermined or 
underdetermined linear system often signals that 
one or more variables were omitted in formulating 
the problem or that extraneous variables were 
included. This often leads to some kind of 
undesirable physical result. 


THEOREM 4.8.5 


If 4x = hb is a consistent linear system of m equations in n unknowns, and if A has rank r, then the general 
solution of the system contains », — » parameters. 


Proof It follows from Theorem 4.7.2 that the number of parameters is equal to the nullity of A, which, by 
Theorem 4.8.2, is » — p. 


THEOREM 4.8.6 


Let A be an yz x » matrix. 

(a) (Overdetermined Case) If}; = », then the linear system 4x — h is inconsistent for at least one vector 
bin Rk”. 

(b) (Underdetermined Case) If j; < », then for each vector h in R” the linear system Ax — h is either 
inconsistent or has infinitely many solutions. 


Proof (a) Assume that j; = », in which case the column vectors of A cannot span &” (fewer vectors than the 
dimension of 8”). Thus, there is at least one vector h in 2™ that is not in the column space of A, and for that h the 
system 4x — h is inconsistent by Theorem 4.7.1. 


Proof (b) Assume that j < ». For each vector h in 8” there are two possibilities: either the system 4x — hb is 
consistent or it is inconsistent. If it is inconsistent, then the proof is complete. If it is consistent, then Theorem 4.8.5 
implies that the general solution has », — » parameters, where r = rank(_A). But rank (A) is the smaller of m and n, 
SO 


n=—r=n—m>) 
This means that the general solution has at least one parameter and hence there are infinitely many solutions. 


EXAMPLE 5 Overdetermined and Underdetermined Systems 


(a) What can you say about the solutions of an overdetermined system 4x — h of 7 equations in 5 
unknowns in which A has rank » = 4? 


(b) What can you say about the solutions of an underdetermined system 4x — h of 5 equations in 7 
unknowns in which A has rank » = 4? 


Solution 

(a) The system is consistent for some vector h in 2’, and for any such h the number of parameters in 
the general solution is y—~,—=5—4= ]. 

(b) The system may be consistent or inconsistent, but if it is consistent for the vector h in R°, then the 
general solution has » — » = 7 — 4 = 3 parameters. 


EXAMPLE 6 An Overdetermined System << 


The linear system 


xy — 2x27 = by 
x1 — *2 = 43 
X1 + x2 = 23 
X1 + 2x27 = bg 
xy + 3x2 = 45 


is overdetermined, so it cannot be consistent for all possible values of b1, b3, b3, b4, and bs. Exact 
conditions under which the system is consistent can be obtained by solving the linear system by Gauss— 
Jordan elimination. We leave it for you to show that the augmented matrix is row equivalent to 


1 0 2b = by 
0 1 bp = by 
0 0 b: = 3bp + 2by (5) 
0 0 by = 465 + 3b 
00 b5 =— 5bp + 44, 
Thus, the system is consistent if and only if 51, 3, b3, by, and bs satisfy the conditions 

2h, — 362 + 43 = 0 

3b, = 4b t bg = 0 

4b, = 53 + bs = 0 


Solving this homogeneous linear system yields 
by =S5r—4s, $3=4r—35, b3=2r—s, bg=r, b5=8 


where r and s are arbitrary. 


Remark The coefficient matrix for the linear system in the last example has » — ? columns, and it has rank » — 2 
because there are two nonzero rows in its reduced row echelon form. This implies that when the system is 
consistent its general solution will contain » — » = 0 parameters; that is, the solution will be unique. With a 
moment's thought, you should be able to see that this is so from 5. 


The Fundamental Spaces of a Matrix 


There are six important vector spaces associated with a matrix A and its transpose 47: 
row space of A row space of AP 
column space of A column space of A? 


null space of A null space of A? 


However, transposing a matrix converts row vectors into column vectors and conversely, so except for a difference 
in notation, the row space of 47 is the same as the column space of A, and the column space of ,47 is the same as 


the row space of A. Thus, of the six spaces listed above, only the following four are distinct: 


row space of A column space of A 


null space of A null space of A? 


If A is an jy x 2 matrix, then the row space and 
null space of A are subspaces of 2”, and the 
column space of A and the null space of 47 are 


subspaces of 2”". 


These are called the fundamental spaces of a matrix A. We will conclude this section by discussing how these four 
subspaces are related. 


Let us focus for a moment on the matrix 47. Since the row space and column space of a matrix have the same 
dimension, and since transposing a matrix converts its columns to rows and its rows to columns, the following 
result should not be surprising. 


THEOREM 4.8.7 


If A is any matrix, then rank (4) = rank (4 i) 


Proof 
_ 4 _ 4 Ti T 
rank {4} = dim(row space of A) = dim {column space of A = rank (4 


This result has some important implications. For example, if A is an j s¢ 9, matrix, then applying Formula 4 to the 
matrix 47 and using the fact that this matrix has m columns yields 


rank (47) nullity (47 } =m 


which, by virtue of Theorem 4.8.7, can be rewritten as 
rank (4) | nullity (47) =m (6) 

This alternative form of Formula 4 in Theorem 4.8.2 makes it possible to express the dimensions of all four 
fundamental spaces in terms of the size and rank of A. Specifically, if rank(_4} = , then 

dim [row (A) ] =r dim [col(4A)] =r 

; 7 

dim [null(A)] = —r dim null(A7}] =m —r ) 
The four formulas in 7 provide an algebraic relationship between the size of a matrix and the dimensions of its 
fundamental spaces. Our next objective is to find a geometric relationship between the fundamental spaces 
themselves. For this purpose recall from Theorem 3.4.3 that if A is an jy x 4 matrix, then the null space of A 


consists of those vectors that are orthogonal to each of the row vectors of A. To develop that idea in more detail, we 
make the following definition. 


DEFINITION 2 


If W is a subspace of 8”, then the set of all vectors in 8” that are orthogonal to every vector in W is called 
the orthogonal complement of W and is denoted by the symbol }f +. 


The following theorem lists three basic properties of orthogonal complements. We will omit the formal proof 
because a more general version of this theorem will be given later in the text. 


THEOREM 4.8.8 


If W is a subspace of 8”, then: 
(a) WW + is a subspace of R”. 
(b) The only vector common to W and jf + is 0. 


(c) The orthogonal complement of }” Lis W. 


EXAMPLE 7 Orthogonal Complements << 


In p2 the orthogonal complement of a line W through the origin is the line through the origin that is 
perpendicular to W (Figure 4.8.1a); and in 2? the orthogonal complement of a plane W through the 
origin is the line through the origin that is perpendicular to that plane (Figure 4.8.15). 


I 

| 
aor 
7 
4 


Ni 
JI 


(a) (5) 
Figure 4.8.1 


Explain why {0} and R” are orthogonal 
complements. 


A Geometric Link Between the Fundamental Spaces 


The following theorem provides a geometric link between the fundamental spaces of a matrix. Part (a) is essentially 
a restatement of Theorem 3.4.3 in the language of orthogonal complements, and part (b), whose proof is left as an 
exercise, follows from part (a). The essential idea of the theorem is illustrated in Figure 4.8.2. 


THEOREM 4.8.9 


If A is an p92 x 2 matrix, then: 
(a) The null space of A and the row space of A are orthogonal complements in ®”. 


(b) The null space of 47 and the column space of A are orthogonal complements in 2”. 


Figure 4.8.2 


More on the Equivalence Theorem 


As our final result in this section, we will add two more statements to Theorem 4.8.4. We leave the proof that those 
statements are equivalent to the rest as an exercise. 


THEOREM 4.8.10 Equivalent Statements 


If A is an »2 s¢ » matrix, then the following statements are equivalent. 
(a) A is invertible. 

(b) Ax —0 has only the trivial solution. 

(c) The reduced row echelon form of A is /,,. 

(d) Ais expressible as a product of elementary matrices. 

(e) Ax —h is consistent for every » x | matrix h. 

() Ax=b has exactly one solution for every » x | matrix h. 
(g) detCA) #0. 

(h) The column vectors of A are linearly independent. 

(i) The row vectors of A are linearly independent. 

(j) The column vectors of A span 2”. 

(k) The row vectors of A span 2”. 

(l) The column vectors of A form a basis for 2”. 

(m) The row vectors of A form a basis for 2”. 

(n) A has rank ».- 

(o) A has nullity 0. 

(p) The orthogonal complement of the null space of A is 2”. 
(q) The orthogonal complement of the row space of A is {0}. 


Applications of Rank 


The advent of the Internet has stimulated research on finding efficient methods for transmitting large amounts of 
digital data over communications lines with limited bandwidths. Digital data are commonly stored in matrix form, 
and many techniques for improving transmission speed use the rank of a matrix in some way. Rank plays a role 
because it measures the “redundancy” in a matrix in the sense that if A is an j; % » matrix of rank 4, then » — ¢ of 
the column vectors and j; — j of the row vectors can be expressed in terms of k linearly independent column or 
row vectors. The essential idea in many data compression schemes is to approximate the original data set by a data 
set with smaller rank that conveys nearly the same information, then eliminate redundant vectors in the 
approximating set to speed up the transmission time. 


Concept Review 

e Rank 

e Nullity 

e Dimension Theorem 

e Overdetermined system 
e Underdetermined system 


e Fundamental spaces of a matrix 


Relationships among the fundamental spaces 


e Orthogonal complement 


Equivalent characterizations of invertible matrices 


Skills 
e Find the rank and nullity of a matrix. 


e Find the dimension of the row space of a matrix. 


Exercise Set 4.8 


1. Verify that rank (4) = rank (47), 


Answer: 


Rank(A) = Rank(A?) =2 


2. Find the rank and nullity of the matrix; then verify that the values obtained satisfy Formula 4 in the Dimension 
Theorem. 


(a) 1 =1 3 
A=|5 —4 —4 
j =6 3 
(b) 0 I 
A=|4 0 -2 
00 O 
(c) 1453 
A=| 2130 
ey a ae 
(d) 145 6 9 
<2. 1 4 a4 
sl Ol =2: =] 
2 3 5 7 8 


(e) i a ee 
0 3 6 O =—3 

A=| 2 -3 -2 4 4 

3 =—6 0 6 5 


. In each part of Exercise 2, use the results obtained to find the number of leading variables and the number of 
parameters in the solution of 4x — Q without solving the system. 


Answer: 


(a) 251 
(b) 1;2 
(c) 2;2 
(d) 2;3 
(e) 3;2 


. In each part, use the information in the table to find the dimension of the row space of A, column space of A, 
null space of A, and null space of 47. 


(a) | 6 | © | @M] © (g) 


Size of A| 3x 3| 3x3| 3x3] 5x9] 9x5 <i 6x2 
Rank(A) 2 


. In each part, find the largest possible value for the rank of A and the smallest possible value for the nullity of A. 
(a) Ais4x4 
(b) Ais3 x5 
(c) Ais5 x3 


Answer: 


(a) Rank =4, nullity =0 

(b) Rank = 3, nullity = 2 

(c) Rank = 3, nullity = 0 

. If A is an jz x 3 matrix, what is the largest possible value for its rank and the smallest possible value for its 
nullity? 


. In each part, use the information in the table to determine whether the linear system 4x — h is consistent. If so, 
state the number of parameters in its general solution. 


(a) (b) (c) (d) (e) (f) 
Size of A 3x3) 3x3) 3x3) 5x9) 5x9] 4x4] 6x2 
Rank (A) 3 2 1 2 2 0 3 
Rank [A|b]| 3 3 1 S 3 0 2 


10. 


1 


— 


12. 


Answer: 


(a) Yes, 0 
(b) No 

(c) Yes, 2 
(d) Yes, 7 
(e) No 

(f) Yes, 4 
(g) Yes, 0 


. For each of the matrices in Exercise 7, find the nullity of A, and determine the number of parameters in the 


general solution of the homogeneous linear system 4x — 0. 


. What conditions must be satisfied by 41, 43, #3, £4, and 5 for the overdetermined linear system 
xy—3xg=5, 
x, — 2x9 = 53 
x1 +%2=43 
xy—4x9=54 
xy + 5x3= 55 


to be consistent? 


Answer: 


by =r, b9=8, b3=4s—3r, b4=2r—s, b5=8s— Tr 
Let 


aul! 412 413 
a2] 222 273 


Show that A has rank 2 if and only if one or more of the determinants 
@11 @12 @11 @13 @12 @13 
@21 422 a1 @23 432 @33 


> > 


is nonzero. 


. Suppose that A is a 3 x 3 matrix whose null space is a line through the origin in 3-space. Can the row or column 


space of A also be a line through the origin? Explain. 


Answer: 
No 
Discuss how the rank of A varies with ¢. 
(a) t a 
A=|1¢ 1 
ft. 3] 
(b) f 3 —1 


13. 


14. 


15. 
16. 


17. 


18. 


19. 


Are there values of r and s for which 


1 0 0 
Or=—2 2 
O s—1 r+2 
0.60 3 


has rank 1? Has rank 2? If so, find those values. 
Answer: 


Rank is 2 if » = 2 and s = 1; the rank is never 1. 


Use the result in Exercise 10 to show that the set of points (x, y, z) in R? for which the matrix 
x yy Zz 
lx py 

has rank 1 is the curve with parametric equations x = ¢, y = a z=t>. 


Prove: If * + 0, then A and kA have the same rank. 


(a) Give an example of a 3 x 3 matrix whose column space is a plane through the origin in 3-space. 
(b) What kind of geometric object is the null space of your matrix? 


(c) What kind of geometric object is the row space of your matrix? 


(a) If A is a 3 x § matrix, then the number of leading 1’s in the reduced row echelon form of A is at most 


. Why? 

(b) If A is a 3 x 4 matrix, then the number of parameters in the general solution of 4x — () is at most 
. Why? 

(c) If A is a4 x 3 matrix, then the number of leading 1’s in the reduced row echelon form of A is at most 
. Why? 

(d) If A is a5 x 3 matrix, then the number of parameters in the general solution of 4, — () is at most 
. Why? 

Answer: 

(a) 3 

(b) 5 

(c) 3 

(d) 3 

(a) If A is a 3 x 5 matrix, then the rank of A is at most . Why? 

(b) If A is a 3 x § matrix, then the nullity of A is at most . Why? 

(c) If Ais a 3 x% 5 matrix, then the rank of 47 is at most . Why? 

(d) If A is a 3 x 5 matrix, then the nullity of 47 is at most . Why? 


Find matrices A and B for which rank(_A) = rank(8), but rank (4”} # rank (2 , ) 


Answer: 


0 1 1 2 
a=|( of s=() ‘| 
20. Prove: If a matrix A is not square, then either the row vectors or the column vectors of A are linearly dependent. 
True-False Exercises 
In parts (a)-(j) determine whether the statement is true or false, and justify your answer. 
(a) Either the row vectors or the column vectors of a square matrix are linearly independent. 


Answer: 


False 


(b) A matrix with linearly independent row vectors and linearly independent column vectors is square. 
Answer: 


True 


(c) The nullity of a nonzero j % % matrix is at most m. 
Answer: 


False 


(d) Adding one additional column to a matrix increases its rank by one. 
Answer: 


False 


(e) The nullity of a square matrix with linearly dependent rows is at least one. 
Answer: 


True 


(f) If A is square and 4x — h is inconsistent for some vector h, then the nullity of A is zero. 
Answer: 


False 


(g) If a matrix A has more rows than columns, then the dimension of the row space is greater than the dimension of 
the column space. 


Answer: 
False 

(h) tf rank(A”} = rank (4), then A is square. 
Answer: 


False 


(i) There is no 3 x 3 matrix whose row space and null space are both lines in 3-space. 


Answer: 


True 


(j) If Vis a subspace of R” and W is a subspace of V, then }’+ is a subspace of f7 +. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


4.9 Matrix Transformations from R” to R”™ 


In this section we will study functions of the form w = #' (x), where the independent variable x is a vector in R” and the 
dependent variable yy is a vector in 8”. We will concentrate on a special class of such functions called “matrix 
transformations.” Such transformations are fundamental in the study of linear algebra and have important applications 
in physics, engineering, social sciences, and various branches of mathematics. 


Functions and Transformations 


Recall that a function is a rule that associates with each element of a set A one and only one element in a set B. If f 
associates the element b with the element a, then we write 


b= f(a) 
and we say that b is the image of a under f or that j (a) is the value of fat a. The set A is called the domain of fand the 


set B the codomain of f (Figure 4.9.1). The subset of the codomain that consists of all images of points in the domain is 
called the range of f- 


b = f(a) 
Domain Codomain 
A B 


Figure 4.9.1 


For many common functions the domain and codomain are sets of real numbers, but in this text we will be concerned 
with functions for which the domain and codomain are vector spaces. 


DEFINITION 1 


If V and W are vector spaces, and if fis a function with domain V and codomain W, then we say that fis a 
transformation from V to W or that fmaps V to W, which we denote by writing 


VW 


In the special case where }” — }f’, the transformation is also called an operator on V. 


In this section we will be concerned exclusively with transformations from 2” to RX”; transformations of general vector 
spaces will be considered in a later section. To illustrate one way in which such transformations can arise, suppose that 
J 4, 2, ---. f m are real-valued functions of n variables, say 


Wy = FAiUFi 2G ca Xa) 
W2 = SF 2(%1, %2, ---4%y) (1) 


Wm = fm(X1,%2,--.%n) 


These m equations assign a unique point (w 1, w3, ..., Wm) in R”™ to each point (x4, x2, ..., X,) in R” and thus define a 
transformation from ®” to R”™. If we denote this transformation by 7, then 7:2" — R™ and 


7(X1, %, --+%y) = Oi asses Wm) 


Matrix Transformations 


In the special case where the equations in | are linear, they can be expressed in the form 


Wy = @iX, + @19%2 Ft ttt + aiyXy 
w2 = a71%1 4 a79X2 +t aes | AIX y Q2) 
Wm = GmiX, + @my2%2 + sot se AyyXy 
which we can write in matrix notation as 
Wy 411 @12 *"* * {py |] *1 
W2 421 @22 *** Gm || %*2 
i | |aioi i (li G3) 
Wm Am] @m2 °°" * Gm || %n 
or more briefly as 
w= Ax (4) 


Although we could view this as a linear system, we will view it instead as a transformation that maps the column vector 
x in R” into the column vector w in ™ by multiplying x on the left by A. We call this a matrix transformation (or 
matrix operator if 7; = »), and we denote it by 7 4:2" — R™. With this notation, Equation 4 can be expressed as 


w= T(x) (5) 


The matrix transformation 7’ 4 is called multiplication by A, and the matrix A is called the standard matrix for the 
transformation. 


We will also find it convenient, on occasion, to express 5 in the schematic form 
x Aw (6) 
which is read “7" 4 maps x into yw.” 
EXAMPLE 1 AMatrix Transformation from R* toR? 


The matrix transformation 7: R4 _, 23 defined by the equations 


Wy = 2x, —3x2z4+%3-—5x4 
wa = 4x, +4x2—2x%34+%4 (7) 
wa = 3x,y—x9+4x3 


can be expressed in matrix form as 


wi 2-3 1. x9 
w2/=/4 1 =—2 1 x3 (8) 
W3 5 =-1 4 O x4 
so the standard matrix for T is 
2 =3 1 =5 
A=|4 1 =—2 1 
5 =-1 4 0O 


The image of a point (x1, x3, x3, x4) can be computed directly from the defining equations 7 or from 8 
by matrix multiplication. For example, if 
(x1,%2, *3, x4) = (1, — 3, 0, 2) 
then substituting in 7 yields wy = 1, wz = 3, w3 = 8 (verify), or alternatively from 8, 
Wy 2 —3 rs) 1 
w2)/=/4 1 =2 1 =|3 
W3 5 —1 4 0 8 


Some Notational Matters 


Sometimes we will want to denote a matrix transformation without giving a name to the matrix itself. In such cases we 
will denote the standard matrix for 7:2" — R” by the symbol [7]. Thus, the equation 


T(x) = [7]x (9) 


is simply the statement that 7 is a matrix transformation with standard matrix [7’], and the image of x under this 
transformation is the product of the matrix [7] and the column vector x. 


Properties of Matrix Transformations 


The following theorem lists four basic properties of matrix transformations that follow from properties of matrix 
multiplication. 


THEOREM 4.9.1 


For every matrix A the matrix transformation 7 4: 2” —. R™ has the following properties for all vectors y and y 
in 8” and for every scalar k: 


(a) TA(Q) =0 
(b) Falku) = kT g(a) [Homogeneity property] 
(c) Fautv) = Tatu) + Talv) [Additiity property] 


(d) Tau—v) = Tatu) — Tay) 


Proof All four parts are restatements of familiar properties of matrix multiplication: 


AQ =0, Aku) = (Au), ACu+ v) = Au+ Av, ACu—v) = Au— Av 


It follows from Theorem 4.9.1 that a matrix transformation maps linear combinations of vectors in R” into the 
corresponding linear combinations in R™ in the sense that 


T alkyuy + &qug+ + + + + kyuy) =A, 7 luz) + &27 glug) + + + + + kT atu,) (10) 


Depending on whether n-tuples and m-tuples are regarded as vectors or points, the geometric effect of a matrix 
transformation 7 4:2” — R™ is to map each vector (point) in R” into a vector (point) in R”™ (Figure 4.9.2). 


R" R™ R" R™ 


x T(x) x -— —™e 


T maps vectors to vectors. T maps points to points. 


Figure 4.9.2 


The following theorem states that if two matrix transformations from ®” to R™ have the same image at each point of 
R” then the matrices themselves must be the same. 


THEOREM 4.9.2 


If P 4k" — R™ and Tp: R” — R™ are matrix transformations, and if T 4(x) = Tp(x) for every vector x in R” 
, then 4 — 8. 


Proof To say that 7 4(x) = T p(x) for every vector in 8” is the same as saying that 


Ax = Bx 
for every vector x in 8”. This is true, in particular, if x is any of the standard basis vectors e;, e2, ..., @, for R”; that is, 


Ae;=Se; (j= 1,2,..,%) (11) 


Since every entry of ®&; is 0 except for the jth, which is 1, it follows from Theorem 1.3.1 that Ae j 1s the jth column of A 
and Be ; 1s the jth column of B. Thus, it follows from 11 that corresponding columns of A and B are the same, and hence 


that 4 — B. 


EXAMPLE 2 ZeroTransformations 


If 0 is the 372 sx¢ y zero matrix, then 
To(x) =0xk=0 


so multiplication by zero maps every vector in R” into the zero vector in R”". We call Tg the zero 
transformation from R” to R™. 


EXAMPLE 3 ldentity Operators <@ 


If J is the » x » identity matrix, then 
Py(x) =ix=x 


so multiplication by / maps every vector in R” into itself. We call 7’; the identity operator on R”. 


A Procedure for Finding Standard Matrices 


There is a way of finding the standard matrix for a matrix transformation from 2” to R” by considering the effect of 
that transformation on the standard basis vectors for R”. To explain the idea, suppose that A is unknown and that 
@1, €3,-... @y 
are the standard basis vectors for R”. Suppose also that the images of these vectors under the transformation 7 4 are 
Tale1) = Aey, Tale2) = Ae, .... Talen) = Aen 
It follows from Theorem 1.3.1 that Ae j 18 a linear combination of the columns of A in which the successive coefficients 


are the entries of ®j. But all entries of ®; are zero except the jth, so the product Ae j 1s just the jth column of the matrix 
A. Thus, 


A=[Tale1)|Taler)|- + * |Talen)] (12) 


In summary, we have the following procedure for finding the standard matrix for a matrix transformation: 


Finding the Standard Matrix for a Matrix Transformation 


Step 1. Find the images of the standard basis vectors e;, €3, ..., @» for R” in column form. 


Step 2. Construct the matrix that has the images obtained in Step 1 as its successive columns. This matrix is the 
standard matrix for the transformation. 


Reflection Operators 


Some of the most basic matrix operators on R2 and 7? are those that map each point into its symmetric image about a 


fixed line or a fixed plane; these are called reflection operators. Table 1 shows the standard matrices for the reflections 
about the coordinate axes in p2, and Table 2 shows the standard matrices for the reflections about the coordinate planes 


in R3. In each case the standard matrix was obtained by finding the images of the standard basis vectors, converting 


those images to column vectors, and then using those column vectors as successive columns of the standard matrix. 


Operator 


Reflection about the 
y-acis 
T(x, y) = (=z, y) 


Reflection about the 
X-axis 


T(x, yy) = (x, —y) 


Reflection about the line 
yox 
T(x, YJ =O, x) 


Operator Illustration 


Reflection about the 
xy-plane 
TX, Z2=(, Y, 


Reflection about the 
xz-plane 


T(x, y,Z) = (x, — ¥,Z) 


Reflection about the 
yz-plane 
T(x, y,Z) = (=x, y,2Z) 


Illustration 


Table 1 


Table 2 


Standard 
Matrix 


Images of e; and e2 


T(1, 0) =(—1, 0) 
T(0, 1) =(0, 1) 


F(e1) 
P(e) 


7(1, 0) = (1,0) 
T(0, 1)=(0, —1) 


T(e1) = 
T(e2) = 


T(1, 0) = (0, 1) 
T(0, 1) =(1, 0) 


Standard 
Matrix 


€1, €2, €3 


T(1, 0, 0) = (1, 0, 0) 
T(0, 1,0) =(0, 1, 0) 
T(0, 0, 1)=(0,0, —1) 


T(1, 0, 0) = (1, 0, 0) 
T(0, 1,0) =(0, —1,0) 
T(0, 0, 1)=(0, 0, 1) 


T(1, 0, 0) = (=1, 0, 0) 
T(0, 1,0) = (0, 1,0) 
T(0, 0, 1) = (0,0, 1) 


Projection Operators 


Matrix operators on 2 and 2? that map each point into its orthogonal projection on a fixed line or plane are called 


projection operators (or more precisely, orthogonal projection operators). Table 3 shows the standard matrices for the 
orthogonal projections on the coordinate axes in R?, and Table 4 shows the standard matrices for the orthogonal 


projections on the coordinate planes in 27. 


Table 3 


Operator Illustration Images of e; and e2 Standard 
Matrix 


Orthogonal projection on the ° = 7(1,0)=(1, 9) 
x-axis T(x, ¥) = (x, 0) 3 aa T(0, 1) = (0, 0) 


Orthogonal projection on the : = 7T(1,0) = (0, 0) 
y-axis T(x, y) = (0, y) 7(0, 1) = (0, 1) 


Table 4 
Operator Illustration Images of e1, e2, €3 Standard 
Matrix 
Orthogonal projection on . = 7(1, 0,0) = (1, 0, 0) 
the xy-plane 7F(0, 1,0) = (0, 1, 0) 


T(x, y.2) = (x, y, 0) a = T(0,0, 1) =(0, 0,0) 


Orthogonal projection on ; = 7(1, 0,0) = (1, 0, 0) 
the xz-plane 7T(0, 1, 0) = (0, 0, 0) 
Eso, 0.2) 3 : = T(0,0,1)=(0,0,1) 


Orthogonal projection on : rr = F(1, 0,0) = (0, 0, 0) 
the yz-plane x F(O, 1,0) = (0, 1, 0) 
T(x, y,z) = (0, y,z) | = T(0,0,1)=(0,0, 1) 


Rotation Operators 


Matrix operators on 22 and 2? that move points along circular arcs are called rotation operators. Let us consider how 
to find the standard matrix for the rotation operator 7: R2 —_, 22 that moves points counterclockwise about the origin 
through an angle 0 (Figure 4.9.3). As illustrated in Figure 4.9.3, the images of the standard basis vectors are 

F(e,) = 71, 0) = (cos #, sn #) and Tez) = TCO, 1) = (= sin, cos #) 


so the standard matrix for T is 
é —siné 
a a __ | cos 
| (« +) snf cos@ 


Figure 4.9.3 


In keeping with common usage we will denote this operator by Ry and call 
(13) 


the rotation matrix for 22. If x = (x, y) 1s a vector in R2, and if w= (w 1, w2) is its image under the rotation, then the 
relationship w= Rx can be written in component form as 
wy, = xcos# — ysin# 


. 14 
w= xsin# + ycosf On 


These are called the rotation equations for R2. These ideas are summarized in Table 5. 


Table 5 
Operator Illustration Rotation Equations | Standard Matrix 
Rotation through an angle 9 ° (Ww), Wy) wy =xcos#— ysin# | | cos# —sinf 
) w2 = xsinf + ycosf sind = cosf 


In the plane, counterclockwise angles are positive 
and clockwise angles are negative. The rotation 
matrix for a clockwise rotation of —# radians can be 
obtained by replacing # by —@ in 12. After 
simplification this yields 
oe cos# sinf | 
—sinf cosf 


EXAMPLE 4 ARotation Operator <@ 


Find the image of x = (1, 1) under a rotation of z / 6 radians ( = 30" | about the origin. 


Solution It follows from 13 with @ = z ; 6 that 


Rytex = 


or in comma-delimited notation, R;g(1, 1) = (0.37, 1.37). 


Rotations in R® 


A rotation of vectors in 2? is usually described in relation to a ray emanating from the origin, called the axis of 
rotation. As a vector revolves around the axis of rotation, it sweeps out some portion of a cone (Figure 4.9.4a). The 
angle of rotation, which is measured in the base of the cone, is described as “clockwise” or “counterclockwise” in 
relation to a viewpoint that is along the axis of rotation /ooking toward the origin. For example, in Figure 4.9.4a the 
vector y results from rotating the vector x counterclockwise around the axis / through an angle g. As in 22, angles are 
positive if they are generated by counterclockwise rotations and negative if they are generated by clockwise rotations. 


- Counterclockwise 
rotation 


(a) Angle of rotation (b) Right-hand rule 


Figure 4.9.4 


The most common way of describing a general axis of rotation is to specify a nonzero vector y that runs along the axis 
of rotation and has its initial point at the origin. The counterclockwise direction for a rotation about the axis can then be 
determined by a “right-hand rule” (Figure 4.9.4): If the thumb of the right hand points in the direction of y, then the 
cupped fingers point in a counterclockwise direction. 


A rotation operator on 2 is a matrix operator that rotates each vector in 27 about some rotation axis through a fixed 
angle §. In Table 6 we have described the rotation operators on 27 whose axes of rotation are the positive coordinate 
axes. For each of these rotations one of the components is unchanged, and the relationships between the other 
components can be derived by the same procedure used to derive 14. For example, in the rotation about the z-axis, the 
z-components of x and w= 7'(x) are the same, and the x- and y-components are related as in 14. This yields the rotation 
equation shown in the last row of Table 6. 


Table 6 


Operator Standard Matrix 


C ounterclockwise w, =x | 0 0 
rotation about ; 
the positive x-axis y | w2=ycos 6-zsin 6 0 cosO -siné 
through an w3=y sin 8+zcos 6 0 sind cosé 
angle 0 

Counterclockwise =xcos #+zsin@ cos@ O- siné 


rotation about 
the positive y-axis 0 l 0 

through an y 3 =X sin 0+zc0s 0 -sin@ O cos@ 
angle 6 


Counterclockwise 
rotation about 

the positive z-axis Ww) =x sin @+y cos O sn@ cos@ O 
through an w3=2 0 0 1 
angle @ 


Ww, =x cos #-y sin 6 cos@ -sin@ 0 


For completeness, we note that the standard matrix for a counterclockwise rotation through an angle @ about an axis in 
R23, which is determined by an arbitrary unit vector u = (a, b, ¢) that has its initial point at the origin, is 


a? (1 a cos} +cos# ab(1l—cosf)—csinf ac(1—cosf) + dsinf 
ab(1 — cos) + esinf b? (1 - cos} +cosf bce(l—cosf) —asinf (15) 
ac(1—cos#)—sino be(1 —cosf) + asinf oe (1 - cos} + cosf 

The derivation can be found in the book Principles of Interactive Computer Graphics, by W. M. Newman and R. F. 


Sproull (New York: McGraw-Hill, 1979). You may find it instructive to derive the results in Table 6 as special cases of 
this more general result. 


Dilations and Contractions 


If k is a nonnegative scalar, then the operator T(x) = éx on R2 or R? has the effect of increasing or decreasing the 
length of each vector by a factor of k. If 0 < £ < 1 the operator is called a contraction with factor k, and if = 1 it is 


called a dilation with factor k (Figure 4.9.5). If + — |, then T is the identity operator and can be regarded either as a 
contraction or a dilation. Tables 7 and 8 illustrate these operators. 


yan 
x a T(x) = kx 
Pas : 
A 
; T(x) =k 
Pt (x) =kx / 
td of 
(a) O<k<1 (b) k>1 
Figure 4.9.5 
Table 7 
Operator Illustration Effect on the Standard Basis Standard 


T(x, y) = (kx, ky) Matrix 


) 


Contraction with factor k 
on R27 (0<k <1) 


Dilation with factor k on bs Pp (kx, ky) 


R? (& > 1) ‘ (y) 


Table 8 
Illustration Standard 
Operator T(x, y, Z) = (kx, ky, kz) Matrix 


Contraction with 
factor k on R° 


(OskS1) 


x 


Tix) sa (kx, ky, kz) 


Dilation with 
factor k on R? 


(k21) 


Yaw, Pitch, and Roll 


In aeronautics and astronautics, the orientation of an aircraft or space shuttle relative to an xyz-coordinate 
system is often described in terms of angles called yaw, pitch, and roll. If, for example, an aircraft is flying 


along the y-axis and the xy-plane defines the horizontal, then the aircraft's angle of rotation about the z-axis is 
called the yaw, its angle of rotation about the x-axis is called the pitch, and its angle of rotation about the y-axis 
is called the roll. A combination of yaw, pitch, and roll can be achieved by a single rotation about some axis 
through the origin. This is, in fact, how a space shuttle makes attitude adjustments—it doesn't perform each 
rotation separately; it calculates one axis, and rotates about that axis to get the correct orientation. Such rotation 
maneuvers are used to align an antenna, point the nose toward a celestial object, or position a payload bay for 
docking. 


Expansion and Compressions 


Ina dilation or contraction of 22 or 3, all coordinates are multiplied by a factor k. If only one of the coordinates is 


multiplied by &, then the resulting operator is called an expansion or compression with factor k. This is illustrated in 
Table 9 for 22. You should have no trouble extending these results to 23. 


Table 9 


Operator Illustration Effect on the Standard Basis Standard 
T(x, vy) = (kx, y) Matrix 

Compression of 22 in the 

x-direction with factor k 


(0<k<1) 


Expansion of 22 in the (xy) (kx) 
x-direction with factor k 


(k>1) 


(1,0) 


Operator Illustration Effect on the Standard Basis Standard 
T(x, y) = (x, ky) Matrix 


Compression of 22 in the ’ 


y-direction with factor k 
(O<k<1) 


Operator Illustration Effect on the Standard Basis Standard 
T(x, y) = (kx, y) Matrix 


tt 


Expansion of 2? in the (0, k) As 


y-direction with factor k 


(k 


Shears 


A matrix operator of the form 7(x, y) = (x + ky, y) translates a point (x, y») in the xy-plane parallel to the x-axis by 
an amount {fy that is proportional to the y-coordinate of the point. This operator leaves the points on the x-axis fixed 
(since y = Q), but as we progress away from the x-axis, the translation distance increases. We call this operator the 
shear in the x-direction with factor k. Similarly, a matrix operator of the form T(x, y) = (x, y + &x) is called the 
shear in the y-direction with factor k. Table 10 illustrates the basic information about shears in 22. 


Table 10 


Operator Effect on the Standard Basis Standard 


Matrix 
Shear of 22 in the x-direction with — —y OR 
factor k T(x, y) = (x + ky, y) | 


(1, 0) 
(k <0) 


Shear of 22 in the y-direction with 
factor k T(x, y) = (x, y + kx) 


(k>0) 


EXAMPLE 5 Some Basic Matrix Operators on Re < 


In each part describe the matrix operator corresponding to A, and show its effect on the unit square. 
1 2 


= _f2 0 [2 0 
oar! 2] Om=[2 {} om-[! {I 


Solution By comparing the forms of these matrices to those in Tables 7, 9, and 10, we see that the 
matrix A; corresponds to a shear in the x-direction with factor 2, the matrix Az corresponds to a dilation 
with factor 2, and A3 corresponds to an expansion in the x-direction with factor 2. The effects of these 
operators on the unit square are shown in Figure 4.9.6. 


Figure 4.9.6 


OPTIONAL 
Orthogonal Projections on Lines Through the Origin 


In Table 3 we listed the standard matrices for the orthogonal projections on the coordinate axes in 2. These are special 
cases of the more general operator 7: 22 _, 22 that maps each point into its orthogonal projection on a line L through 
the origin that makes an angle # with the positive x-axis (Figure 4.9.7). In Example 4 of Section 3.3 we used Formula 10 
of that section to find the orthogonal projections of the standard basis vectors for 2? on that line. Expressed in matrix 


form, we found those projections to be 


2 infcosH 
Tle;|=| °° a and Tle3;|= ° i 
sinfcos sin“ 


Figure 4.9.7 


Thus, the standard matrix for T is 


cos8 — sinfcos6 
sinfcosO sin 


— 


| cos? 3 sin20 


Fl =| Tle, ||Tlez}|= 
+sin2@ sin? 


hN 


In keeping with common usage, we will denote this operator by 


2, 1.; 
P cos*6 — sinficos eee 2 net 


sinfcosO sin’ 3 sin2 sin 


We have included two versions of Formula 16 
because both are commonly used. Whereas the first 
version involves only the angle 0, the second 
involves both 0 and 20. 


EXAMPLE 6 Orthogonal Projection on a Line Through the Origin 


Use Formula 16 to find the orthogonal projection of the vector x = (1, 5) on the line through the origin 
that makes an angle of = / 6 ( = 30 with the x-axis. 


Solution Since sin(a/ 6) = 1/2 and cos (7 i 6] = 3 ! 2, it follows from 16 that the standard matrix 


for this projection is 


cos’ (n/6] sin(x / 6)cos(w i 6) 3 ¥3 
Paié= |. 
sin(w / 6)cos(m / 6) sin” (16) ¥3o1 
4 4 
Thus, 
3 3 3 +53 
ne 3 |- “ ie | 
" ¥3 1 (b (3+5 1.68 
4 4 4 


or in comma-delimited notation, Pz/g(1, 5) e (2.91, 1.68) 


Reflections About Lines Through the Origin 


In Table 1 we listed the reflections about the coordinate axes in 22. These are special cases of the more general operator 
Hg: R2 _, p? that maps each point into its reflection about a line Z through the origin that makes an angle 6 with the 


positive x-axis (Figure 4.9.8). We could find the standard matrix for Hg by finding the images of the standard basis 
vectors, but instead we will take advantage of our work on orthogonal projections by using the Formula 16 for Pg to 
find a formula for Hg. 


Figure 4.9.8 


You should be able to see from Figure 4.9.9 that for every vector x in 2” 


Poe —x= 4 (Hox—x] or equivalently Hox = (2P9—t}x 


Figure 4.9.9 


Thus, it follows from Theorem 4.9.2 that 
Ha=2Pqa-l (17) 


and hence from 16 that 


= cos2é sin2@ a 
sn2o —cos26 re) 
EXAMPLE 7 Reflection About a Line Through the Origin 
Find the reflection of the vector x = (1, 5) on the line through the origin that makes an angle of 2/6(= 30°) 
with the x-axis. 
Solution Since sin (7 i 3} = 3 {2andcos(m/ 3) =1/ 2, it follows from 18 that the standard matrix 
for this projection is 
| 1 3 
H cos(m/3)  sin(w/ 3) 5 9 
6 | sin(m/ 3) —cos(n/3)} 3 
2 
Thus, 
a 3 1+ 5y¥3 
Bo es 2 2 - 2 | 4.83 
™ Be _ 3-5 | | -163 
2 2 
or in comma-delimited notation, H,;¢(1, 5) = (4.83, — 1.63) 
Show that the standard matrices in Tables | and 3 
are special cases of 18 and 16. 
| = 


Concept Review 
e Function 


e Image 


e Value 

¢ Domain 

° Codomain 

¢ Transformation 

e Relationships among the fundamental spaces 
e Operator 

e Matrix transformation 

e Matrix operator 

e Standard matrix 

e Properties of matrix transformations 
e Zero transformation 

e Identity operator 

¢ Reflection operator 

e Projection operator 

e Rotation operator 

e Rotation matrix 

e Rotation equations 

e Axis of rotation in 3-space 
e Angle of rotation in 3-space 
e Expansion operator 

¢ Compression operator 

e Shear 

e Dilation 


¢ Contraction 


Skills 
e Find the domain and codomain of a transformation, and determine whether the transformation is linear. 


e Find the standard matrix for a matrix transformation. 


* Describe the effect of a matrix operator on the standard basis in R”. 


Exercise Set 4.9 
In Exercises 1-2, find the domain and codomain of the transformation 7 4(x) = Ax . 


1. (a) A has size 3 x 2. 
(b) A has size 2 x 3. 
(c) A has size 3 sx 3. 
(d) A has size | x 6. 


Answer: 


(a) Domain: 22; codomain: 23 
(b) Domain: 27; codomain: p2 
(c) Domain: 27; codomain: 23 


(d) Domain: 2°; codomain: 2! 


2. (a) A has size 4 x 4. 
(b) A has size 5 x 4. 
(c) A has size 4 x 4. 
(d) A has size 3 sx 1. 


3. If Tfx4, x2) = (x1 + x2, — 2X2, 3x1), then the domain of T is , the codomain of T is , and 
the image of x = (1, — 2) under Tis 


Answer: 


R?, R3, (= 1,2, 3) 
4. If Pix4, x9, X3) = (x1 + 2x9, x1 — 2x3), then the domain of T is , the codomain of T is : 
and the image of x = (0, — 1, 4) under T is 


5. In each part, find the domain and codomain of the transformation defined by the equations, and determine whether 
the transformation is linear. 


(a) wy = 3x, — 2x9 +4x3 
w= 5x, —8xa+ x3 

(b) Wi=2x1x2— —-X2 
W2= X1 +3x 1x2 
w3= Xx, + x2 

(c) ¥1 = 5x1 — x94 x3 
wWwa= —x1+ x94+7x3 
w3 = 2x, —4x2— x3 

() wy= x? = 3x94%3 — 2x4 


w= 3x, —4x3 —x? + x4 
Answer: 


(a) Linear; p3 _, p2 
(b) Nonlinear; p2 _, p7 
(c) Linear; Rp? _, p37 
(d) Nonlinear; p4 _, p2 


6. In each part, determine whether T is a matrix transformation. 
(a) T(x, y) = (2x, y) 
(b) T(x, y) =(-y, x) 
(c) T(x, y) = (2x +y,x—y) 
(d) T(x} — (”.»} 


(e) T(x, y) = (ay +1) 
7. In each part, determine whether 7 is a matrix transformation. 
(a) T(x, y,z) = (0, 9) 
(b) F(x, yz) = (1, 1) 
(c) F(x, y,z) = (3x —4y, 2x = 52) 


(d) T(x.¥.2] = (y?. 2) 
(e) T(x, y,z) = —-1,) 
Answer: 


(a) and (c) are matrix transformations; (b), (d), and (e) are not matrix transformations. 


8. Find the standard matrix for the transformation defined by the equations. 


(a) Wt = 2x, — 3x2 + x4 

w2 = 3x, + 5x2 — x*4 
(b) ¥1 = Tx, + 2x3 — 8x3 

W2 = =x2+ 5x3 

wz = 4x, + 7x2—%3 
(c) M1 = —*1 + X2 

wa = 3x, =— 2x9 

w3 = 5x, = 7x3 
(jae ed 

Wo = X,+%X2 

WR = Xp *xXQ+X 


Ws Xp exAQHAZ+ XY 


9. Find the standard matrix for the operator 7: 2? —_, 2? defined by 


wi, = 3x, + 5x2 = X3 
wa = 4%, — x2 + *3 
wz = 3x, + 2x2 = 4X3 


and then calculate 7'( — 1, 2, 4) by directly substituting in the equations and also by matrix multiplication. 


Answer: 

3 5 —=1 

4 —] 1); 7¢=1, 2,4) = (3, =—2, = 3) 
3 2 —1 


10. Find the standard matrix for the operator T defined by the formula. 
(a) F(x1,%2) = (2x1 — x2, x1 +2) 
(b) 7(%1,%2) = (11, x2) 
(c) 2(%1, 2, %3) = (%1 + 2x2 +23, 41 + 522, x3) 
(d) F(x1,%2, %3) = (4x1, 7x2, — 8x3) 
11. Find the standard matrix for the transformation T defined by the formula. 
(a) 7(x1,%2) = (x2, —%1, 1 + 3x2, x1 —%2) 
(b) 2(%1, 2, %3, 44) = (2x1 + 2x2 — 23 +%4,%2 +23, — 21) 


(c) F(x1, %2, x3) = (0, 0, 0, 0, 0) 
(d) 7(x1, X2, 43, 44) = (%4, 1, %3, X2, 41 — X3) 


Answer: 
(a) Oo 61 
=1 0 
1 3 
1 =1 
(b) 72 =1 1 
Oo 1 10 
=-1 0 0 0 
(c) }O O 0 
00 0 
00 0 
00 0 
00 0 
(d) |}O QO Oo 1 
10 0.0 
0 0 10 
Oo 1 0 0 
10 =—1 0 


12. In each part, find (x), and express the answer in matrix form. 


° Trl=[5 ah == [2] 
(b) -1 
r\=[3 5 sfe= 1 
3 
(c) —2 1 4 X41 
Tl)=| 3 5 FJ, x=] %2 
6 0 =1 x3 

(d) =| J 
Ti/=| 2 4); x=[71] 

7 8 


13. In each part, use the standard matrix for 7 to find 7’'(x); then check the result by calculating 7'(x) directly. 
(a) (x1, x2) = (41 +22, %2);x= (— 1,4) 
(b) F(x4, x2, %3) = (2x, — x2 423, X2 +23, 0); x= (2, 1, = 3) 


Answer: 


(a) T(=—1,4) = (5,4) 
(b) F(2, 1, —3) = (0, — 2, 0) 
14. Use matrix multiplication to find the reflection of ( = 1, 2) about 


(a) the x-axis. 


15. 


16. 


17. 


18. 


19. 


(b) the y-axis. 

(c) the line y = x. 

Use matrix multiplication to find the reflection of (2, — 5, 3} about 
(a) the xy-plane. 

(b) the xz-plane. 

(c) the yz-plane. 


Answer: 


(a) (2; —5, —3) 
(b) (2, 5, 3) 
() (-2, -5,3) 


Use matrix multiplication to find the orthogonal projection of (2, — 5} on 
(a) the x-axis. 

(b) the y-axis. 

Use matrix multiplication to find the orthogonal projection of ( — 2, 1, 3} on 
(a) the xy-plane. 

(b) the xz-plane. 

(c) the yz-plane. 


Answer: 
(a) ( — -s 1, 0) 
(b) (= 2, 0, 3) 


(c) (0, 1, 3) 


Use matrix multiplication to find the image of the vector (3, — 4) when it is rotated through an angle of 
(a) 9=30- 

(b) a= —60°- 

(c) @=45- 

(d) @=90'- 


Use matrix multiplication to find the image of the vector ( — 2, 1, 2) if it is rotated 
(a) 30° about the x-axis. 
(b) 45° about the y-axis. 
(c) 90° about the z-axis. 


Answer: 

(a) _ y3-2 14+2y3 
; a." 2 

(b) (0, 1, 242) 


(c) (1, 2,2) 


20. Find the standard matrix for the operator that rotates a vector in 27 through an angle of —§0° about 


21. 


22. 


23. 
24, 


25. 


26. 


(a) the x-axis. 

(b) the y-axis. 

(c) the z-axis. 

Use matrix multiplication to find the image of the vector ( — 2, 1, 2) if it is rotated 
(a) —3Q° about the x-axis. 

(b) —45° about the y-axis. 


(c) —90° about the z-axis. 


Answer: 


(a) ge fan 2 =-1+2y3 
a : 
(b) (—2y2, 1, 0} 


(c) (1, 2, 2) 
In R23 the orthogonal projections on the x-axis, y-axis, and z-axis are defined by 
T(x, y,Z) = (%,9,9),  72(a, yz) = (0, y, 9), 
T3(x, y, Zz) = (0, 0,2) 
respectively. 


(a) Show that the orthogonal projections on the coordinate axes are matrix operators, and find their standard 
matrices. 


(b) Show that if 7-23 _, p3 is an orthogonal projection on one of the coordinate axes, then for every vector x in 7 
, the vectors 7'(x) and x — 7(x) are orthogonal. 


(c) Make a sketch showing x and x — 7'(x) in the case where T is the orthogonal projection on the x-axis. 
Use Formula 15 to derive the standard matrices for the rotations about the x-axis, y-axis, and z-axis in 27. 


Use Formula 15 to find the standard matrix for a rotation of x / 2 radians about the axis determined by the vector 
v= (1, 1, 1). [Note: Formula 15 requires that the vector defining the axis of rotation have length 1.] 


Use Formula 15 to find the standard matrix for a rotation of 180° about the axis determined by the vector 
v= (2, 2, 1). [Note: Formula 15 requires that the vector defining the axis of rotation have length 1.] 


Answer: 
eaclst 0h «oT 
9 9 9 
ee ee 
9 9 9 
4° 4:- 
9 9 9 


It can be proved that if A is a 2 x 2 matrix with orthonormal column vectors and for which det(_A} = 1, then 
multiplication by A is a rotation through some angle @. Verify that 


satisfies the stated conditions and find the angle of rotation. 


27. The result stated in Exercise 26 can be extended to 22; that is, it can be proved that if A is a 3 ¢ 3 matrix with 
orthonormal column vectors and for which det(4} = 1, then multiplication by A is a rotation about some axis 
through some angle @. Use Formula 15 to show that the angle of rotation satisfies the equation 

tr( A) — 1 
2 
28. Let A be a 3 x 3 matrix (other than the identity matrix) satisfying the conditions stated in Exercise 27. It can be 


bey 2 T. : ‘ 
shown that if x is any nonzero vector in 73, then the vector u= Ax + A°x 4 [1 = tr(4} fx determines an axis of 


cos#= 


rotation when y is positioned with its initial point at the origin. [See “The Axis of Rotation: Analysis, Algebra, 
Geometry,” by Dan Kalman, Mathematics Magazine, Vol. 62, No. 4, October 1989.] 


(a) Show that multiplication by 


I 
O| wilco wl 
Ool~) wl ol 


wots w le ro|co 


is a rotation. 
(b) Find a vector of length 1 that defines an axis for the rotation. 
(c) Use the result in Exercise 27 to find the angle of rotation about the axis obtained in part (b). 


29. In words, describe the geometric effect of multiplying a vector x by the matrix A. 


(aye eo 
(50 

(b) ,_|2 0 
[5 | 

Answer: 


(a) Twice the orthogonal projection on the x-axis. 


(b) Twice the reflection about the x-axis. 


30. In words, describe the geometric effect of multiplying a vector x by the matrix A. 


(a) ,_|2 9 
a-[7 5 
o [ys 4 

oF oe 
A= 

1 ¥3 

aa 


31. In words, describe the geometric effect of multiplying a vector x by the matrix 


_ cos*O—sin’@ —2 sin@ cos 


A 2 2 
2sinfcos@ cos*#—sin“é 


Answer: 


Rotation through the angle 24. 
32. If multiplication by A rotates a vector x in the xy-plane through an angle 0, what is the effect of multiplying x by 47 
? Explain your reasoning. 


33. Let xg be a nonzero column vector in 22, and suppose that 7: 22 _, 2 is the transformation defined by the formula 
T(x) =xg + AXgx, where Ra is the standard matrix of the rotation of R? about the origin through the angle @. Give a 


geometric description of this transformation. Is it a matrix transformation? Explain. 
Answer: 


Rotation through the angle 6 and translation by Xg; not a matrix transformation since Xg is nonzero. 


34. A function of the form f(x} = #x 4- 6 is commonly called a “linear function” because the graph of y = sax 4- b is 


a line. Is fa matrix transformation on R? 


35. Let x = xg + év be a line in R”, and let 7:R™ — R” be a matrix operator on R”. What kind of geometric object is 
the image of this line under the operator 7? Explain your reasoning. 


Answer: 

A line in 2”. 
True-False Exercises 
In parts (a)-(1) determine whether the statement is true or false, and justify your answer. 
(a) If-A is a 2 % 3 matrix, then the domain of the transformation 7’ 4 is R2. 


Answer: 


False 


(b) If A is an yz x » matrix, then the codomain of the transformation 7’ 4 is R”, 
Answer: 


False 
(c) If 7:2" — R™ and T(0) = 0, then 7 is a matrix transformation. 


Answer: 


False 


(d) If 7:R" — R™ and T(eyx + cay) =c,T (x) +27 (y) for all scalars ¢1 and ¢2 and all vectors x and ¥ in R”, then 
Tis a matrix transformation. 


Answer: 


True 


(e) There is only one matrix transformation 7: R” — R™ such that T( —x) = — T(x) for every vector x in R”. 


Answer: 


False 


(f) There is only one matrix transformation 7:2" — R™ such that T(x + y) = T(x — y) for all vectors x and ¥ in R”. 
Answer: 


True 


(g) If h is a nonzero vector in R”, then T(x) =x +b is a matrix operator on 2”. 
Answer: 


False 


(h) 


The matrix is the standard matrix for a rotation. 


Mle pole 
role pole 


Answer: 


False 


(i) The standard matrices of the reflections about the coordinate axes in 2-space have the form k : } where 
a= +1. 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


4.10 Properties of Matrix Transformations 


In this section we will discuss properties of matrix transformations. We will show, for example, that if several 
matrix transformations are performed in succession, then the same result can be obtained by a single matrix 
transformation that is chosen appropriately. We will also explore the relationship between the invertibility of a 
matrix and properties of the corresponding transformation. 


Compositions of Matrix Transformations 


Suppose that 7 4 is a matrix transformation from 2” to R* and 7p is a matrix transformation from R* to R™. If x 
is a vector in 2”, then 7’ 4 maps this vector into a vector 7 4(x) in R*, and Tp, in turn, maps that vector into the 
vector 7’ p({7_4(x)) in RX”. This process creates a transformation from 2” to R™ that we call the composition of 
Tg with T 4 and denote by the symbol 

TpoTty 
which is read “7p circle 74’. As illustrated in Figure 4.10.1, the transformation 7’ 4 in the formula is performed 
first; that is, 


(Tpo T)(x) = Ta(Tax)) (1) 


This composition is itself a matrix transformation since 
(Tp0T4)(x) = Tea(Ta(x)) = BCT A(x)) = BC Ax) = (BA)x 
which shows that it is multiplication by 24. This is expressed by the formula 


TpolT4=Tpg (2) 


WARNING 


Just as it is not true, in general, that 
AB= BA 
so it is not true, in general, that 
TpoTg=TygoTp 
That is, order matters when matrix 
transformations are composed. 


TAT A(x) 


7 
“ 


TyoT, 


Figure 4.10.1 


Compositions can be defined for any finite succession of matrix transformations whose domains and ranges have 


the appropriate dimensions. For example, to extend Formula 2 to three factors, consider the matrix 
transformations 


T4.R° +R”, Tp: R* aR! Te: R! + R™ 
We define the composition (7¢ 0 Tp 0 T. A)R” — R™ by 
(ToolgoT g(x) =Te(TaTAtx))) 


As above, it can be shown that this is a matrix transformation whose standard matrix is (*84 and that 


ToeoTpoTg=Tep, (3) 


As in Formula 9 of Section 4.9 , we can use square brackets to denote a matrix transformation without 
referencing a specific matrix. Thus, for example, the formula 


[7207] = [72] [71] (4) 


is a restatement of Formula 2 which states that the standard matrix for a composition is the product of the 
standard matrices in the appropriate order. Similarly, 


[7307207] = [73] [72] [71] (5) 
is arestatement of Formula 3. 


EXAMPLE 1 Composition of Two Rotations <@ 


Let 7, R?_, p and T? R2 _, p? be the matrix operators that rotate vectors through the angles 6; 
and 3, respectively. Thus the operation 

(72.0 71) (x) = 72(71x)) 
first rotates x through the angle #;, then rotates 7; (x) through the angle #3. It follows that the net 
effect of T'3 o Fy is to rotate each vector in R? through the angle 8; +- #3 (Figure 4.10.2). Thus, the 


standard matrices for these matrix operators are 
cos#; —sinfy cosfy —sinfz 
[i] = » |72/=] . ; 
sinf;  cosfy sinfy cosfly 
cos(#; +62) —sm(@, +62) 
sin(#; +82) cos(#; +62) 


not 


These matrices should satisfy 4. With the help of some basic trigonometric identities, we can 
confirm that this is so as follows: 


To) (Tf a 
(Fo) 71] sinfy cosfz || sinfy cosh; 


fers ae las a 
fee = sinfzsinf; — (cos sin; + wey 
[ 


sinfycosf; + cosfzsinf; —sinfgsinfy + cosfycosh; 


cos(#; +43) —sin(@, + 42) 
sin(#; +62) cos(#, +642) 


Figure 4.10.2 


EXAMPLE 2 Composition Is Not Commutative << 


Let 7,2? —, R? be the reflection about the line y = x, and let 7,.R? —, R? be the orthogonal 
projection on the y-axis. Figure 4.10.3 illustrates graphically that 7; o 73 and 73 0 T have 
different effects on a vector x. This same conclusion can be reached by showing that the standard 
matrices for 7, and 73 do not commute: 


irom =[n][r]=[¢ o][0 ]=[6 
mora -[r][7]-[2 If 2]-[2 9 


so [T207;] # [7,073]. 


) 


TAT\(x)) 


T\(TAx)) 


| T>° 7} | T,°Ts 


Figure 4.10.3 


EXAMPLE 3 Composition of Two Reflections << 


Let 7;:R? — R? be the reflection about the y-axis, and let 7, _, R? be the reflection about the 
x-axis. In this case 7; o 73 and 7 o Tj are the same; both map every vector x = (x, yy) into its 
negative —x = (—x, —y) (Figure 4.10.4): 

(T,0T2)@,y) =11G, —y)=(—2x, —y) 

(T20T))@y) =T20-%,y) =(-%, -y) 


The equality of 7; o 73 and 7 o Ty can also be deduced by showing that the standard matrices for 
T, and 73 commute: 


—1 O};1 O -1 Oo 
[T1072] = [77 = 0 le a = 0 | 
1 QO}; =1 0 -1 O 
nen [afl IE E33 
The operator 7(x) = —x on R2 or R? is called the reflection about the origin. As the foregoing 
computations show, the standard matrix for this operator on 22 is 


Pf 


(xy) (x.y) §9.—-—-—-———-4-—-—-—--—— (xy) 


T(Tx)) 
TT,(x)) 


T° T> T,°T; 


Figure 4.10.4 


EXAMPLE 4 Composition of Three Transformations 
Find the standard matrix for the operator 7: 27 _, R? that first rotates a vector counterclockwise 
about the z-axis through an angle @, then reflects the resulting vector about the ¥Z-plane, and then 


projects that vector orthogonally onto the x¥-plane. 


Solution The operator T can be expressed as the composition 


T=Tz0Tz0T 
where 7’; is the rotation about the z-axis, 73 is the reflection about the yz-plane, and 7°3 is the 


orthogonal projection on the xy-plane. From Tables 6, 2, and 4 of Section 4.9 , the standard 
matrices for these operators are 


cos#@ =—sinf 0 -1 00 10 0 
71|=|sn8 cos# O|, |72/=| 0 1 O|, |T3/=]0 1 0 
0 0 1 00 1 00 0 


Thus, it follows from 5 that the standard matrix for T is 


10 O}] =1 0 Ofj/cos#@ —sné 0 
010 0 Ol] sn@ cos# O 
060 0 0 il 0 0 1 


[7] 


sn@ cosé 


1 
0 
=—cos# sin 0 
0 
0 0 60 


One-to-One Matrix Transformations 


Our next objective is to establish a link between the invertibility of a matrix A and properties of the 
corresponding matrix transformation 7° 4. 


DEFINITION 1 


A matrix transformation 7 4:2” — R™ is said to be one-to-one if T 4 maps distinct vectors (points) in R” 
into distinct vectors (points) in R™. 


(See Figure 4.10.5). This idea can be expressed in various ways. For example, you should be able to see that the 
following are just restatements of Definition 1: 


1. T 4 is one-to-one if for each vector b in the range of A there is exactly one vector x in R” such that T 4x =b. 


2. T gis one-to-one if the equality 7 4(u) = 7 4{v) implies that y = y. 


R" Re" Re R" 


One-to-one Not one-to-one 


Figure 4.10.5 


Rotation operators on 2 are one-to-one since distinct vectors that are rotated through the same angle have 
distinct images (Figure 4.10.6). In contrast, the orthogonal projection of 2? on the xy-plane is not one-to-one 
because it maps distinct points on the same vertical line into the same point (Figure 4.10.7). 


Figure 4.10.7 The distinct points P and Q are mapped into the same point M 


The following theorem establishes a fundamental relationship between the invertibility of a matrix and properties 
of the corresponding matrix transformation. 


THEOREM 4.10.1 


If A is an » s¢ » matrix and 7 ,: R” _, R” is the corresponding matrix operator, then the following 
statements are equivalent. 


(a) Ais invertible. 
(b) The range of T gis R”. 


(c) T 41s one-to-one. 


Proof We will establish the chain of implications (a) => (£) =» (¢) = (a). 


(a) =» (2) Assume that A is invertible. By parts (a) and (e) of Theorem 4.8.10, the system 4x — h 1s consistent 
for every » % | matrix h in 2”. This implies that 74 maps x into the arbitrary vector h in 2”, which in turn 
implies that the range of 7 4 is all of R”. 


(5) = (ce) Assume that the range of 7 4is 8”. This implies that for every vector h in RX” there is some vector x 
in 8” for which T 4{x) =b and hence that the linear system 4x — h is consistent for every vector h in R”. But 
the equivalence of parts (e) and (f) of Theorem 4.8.10 implies that 4. — h has a unique solution for every vector 


b in 2” and hence that for every vector h in the range of 7 4 there is exactly one vector x in R” such that 


T 4ax=b. 


(c) =» (@) Assume that 7’ 4 is one-to-one. Thus, if h is a vector in the range of 7’ 4, there is a unique vector x in 
R” for which 7 4(x) = b. We leave it for you to complete the proof using Exercise 30. 


EXAMPLE 5 Properties of a Rotation Operator 


As indicated in Figure 4.10.6, the operator 7: 2” —, R” that rotates vectors in 22 through an angle 
@ is one-to-one. Confirm that [7’] is invertible in accordance with Theorem 4.10.1. 


Solution From Table 5 of Section 4.9 the standard matrix for T is 
é —=—siné 
rl a cos 
| sn@ cosé 


cos# —siné 
snf cosé@ 


This matrix is invertible because 


ce 7| : 


= cos“6 + sin? =10 


EXAMPLE 6 Properties of a Projection Operator 


As indicated in Figure 4.10.7, the operator 7: R” —, R” that projects each vector in 27 


orthogonally on the xy-plane is not one-to-one. Confirm that [7’] is not invertible in accordance 
with Theorem 4.10.1. 


Solution From Table 4 of Section 4.9 the standard matrix for T is 
10 0 

Ti=|0 10 

00 0 


This matrix is not invertible since det[ 7] = 0. 


Inverse of a One-to-One Matrix Operator 


If? A R” _, R” is a one-to-one matrix operator, then it follows from Theorem 4.10.1 that A is invertible. The 
matrix operator 


f,- :R? = R” 


that corresponds to 4~! is called the inverse operator or (more simply) the inverse of T ‘4. This terminology is 
appropriate because 74 and 7 ao cancel the effect of each other in the sense that if x is any vector in R”, then 


A 
T ,-1(Ta(x)) Ata = k =x 
or, equivalently, 
Paot a = Ve ted 
T-08, =f Ay=Ti 


From a more geometric viewpoint, if w is the image of x under 7 4, then 7 47} maps w back into x, since 
re! (w) = re (Fa(x)) =x 
(Figure 4.10.8). 


Figure 4.10.8 


Before considering examples, it will be helpful to touch on some notational matters. If 7 A R™_, RP" isa 
one-to-one matrix operator, and if 7 Ao -R” —, R" is its inverse, then the standard matrices for these operators 
are related by the equation 
T i=T, (6) 
A A 


In cases where it is preferable not to assign a name to the matrix, we will write this equation as 


[7 |=(7)7 (7) 


EXAMPLE 7 _ Standard Matrix for T! < 


Let 7. R2 _, R2 be the operator that rotates each vector in 2 through the angle g, so from Table 5 


of Section 4.9 , 
é@ —sind 
Pla cos 
| bey ol (8) 


It is evident geometrically that to undo the effect of T, one must rotate each vector in P2 through 
the angle —g. But this is exactly what the operator 7—! does, since the standard matrix for 7—! is 


[7*]=(714= cos@ sn] _ cos(—#) —sin (—#) 
7 | =sin@ cos@| | sin(—@) cos (—0) 


(verify), which is the standard matrix for a rotation through the angle —@. 


EXAMPLE 8 FindingT! “4 


Show that the operator 7: R2 _, R2 defined by the equations 
Wy = 2x, +x2 
wz = 3x, +4x2 


is one-to-one, and find 7 (1. wa}. 


Solution The matrix form of these equations is 


PIG ale 
Maes 


This matrix is invertible (so T is one-to-one) and the standard matrix for 7—! is 


so the standard matrix for T is 


a of 
_ = 5 5 
T |= ‘dn 
| S|. 5 
s 5 
Thus 
re ere? 
rm l=|_3 alll=|_3,, ee 
wo] Wo] 


from which we conclude that 


Linearity Properties 


Up to now we have focused exclusively on matrix transformations from 2” to R™. However, these are not the 
only kinds of transformations from 2” to 8”. For example, if 7 1, 2, -.., fm are any functions of the n 
variables x1, X2, ..., X», then the equations 

wr =F 1(%1,%2,-- Xn) 

wa = f2(%1,%2,--.%n) 


Wim = f m(X1, X2, ---. Xn) 
define a transformation 7: 2” —. R™ that maps the vector x = (x1, X32, ..., X») into the vector (wy, we, -... Wm). 
But it is only in the case where these equations are /inear that T is a matrix transformation. The question that we 


will now consider is this: 


Question 


Are there algebraic properties of a transformation 7:2” —. 2™ that can be used to determine whether T is 
a matrix transformation? 


The answer is provided by the following theorem. 


THEOREM 4.10.2 


T:R”" _, R™ is a matrix transformation if and only if the following relationships hold for all vectors y 
and y in 2” and for every scalar k: 


(i) Tut+v) =7T(u) + TOv)) [Additivity property ] 
(ii) T(ku) = kT (a) [Homogeneity property ] 


Proof If Tis a matrix transformation, then properties (i) and (ii) follow respectively from parts (c) and (b) of 
Theorem 4.9.1. 


Conversely, assume that properties (i) and (ii) hold. We must show that there exists an j3 % % matrix A such that 
T(x) = Av 

for every vector x in 2”. As a first step, recall from Formula (10) of Section 4.9 that the additivity and 

homogeneity properties imply that 


PAu + ug + + + + pay) = ky PCy) + kg (uz) + + + + + ky Tuy) (9) 


for all scalars £1, £3, ..., &» and all vectors uy, ug, ..., uy in 2”. Let A be the matrix 
A= [T(e1)|7(e2)|- + * [Fen] 


in which e1, e3, ..., @» are the standard basis vectors for 2”. 


It follows from Theorem 1.3.1 that 4x is a linear combination of the columns of A in which the successive 
coefficients are the entries x1, X3, ..., Xy Of x. That is, 


Ax = x1T(e1) + x27 (e2) + + + +XyT len) 
Using 9 we can rewrite this as 
Ax = 7T(xje, + x2€2 + 6 * #Xyey) = TX) 


which completes the proof. 


The additivity and homogeneity properties in Theorem 4.10.2 are called linearity conditions, and a 
transformation that satisfies these conditions is called a linear transformation. Using this terminology Theorem 


4.10.2 can be restated as follows. 


THEOREM 4.10.3 


Every linear transformation from 2” to R™ is a matrix transformation, and conversely, every matrix 
transformation from 2” to R™ is a linear transformation. 


More on the Equivalence Theorem 


As our final result in this section, we will add parts (6) and (c) of Theorem 4.10.1 to Theorem 4.8.10. 


THEOREM 4.10.4 Equivalent Statements 


If A is an » x % matrix, then the following statements are equivalent. 
(a) Ais invertible. 

(b) Ax —Q has only the trivial solution. 

(c) The reduced row echelon form of A is /,,. 

(d) Ais expressible as a product of elementary matrices. 

(e) Ax —h is consistent for every » x | matrix b. 

() Ax=—hb has exactly one solution for every » x | matrix h. 
(g) detCA) #0. 

(h) The column vectors of A are linearly independent. 

(i) The row vectors of A are linearly independent. 

(j) The column vectors of A span 2”. 

(k) The row vectors of A span 2”. 

(l) The column vectors of A form a basis for 2”. 

(m) The row vectors of A form a basis for 2”. 

(n) A has rank n. 

(o) Ahas nullity 0. 

(p) The orthogonal complement of the null space of A is R”. 
(q) The orthogonal complement of the row space of A is {0}. 
(r) The range of 74 is R”. 


(s) T gis one-to-one. 


Concept Review 

e Composition of matrix transformations 

° Reflection about the origin 

e One-to-one transformation 

e Inverse of a matrix operator 

e Linearity conditions 

e Linear transformation 

e Equivalent characterizations of invertible matrices 

Skills 

e Find the standard matrix for a composition of matrix transformations. 
e Determine whether a matrix operator is one-to-one; if it is, then find the inverse operator. 


e Determine whether a transformation is a linear transformation. 


Exercise Set 4.10 


In Exercises 1—2, let 74 and 7’p be the operators whose standard matrices are given. Find the standard matrices 
forand TyoTp. 


1. 1 =—2 0 2 —3 3 
A=|4 1 =3|, B=|5 0 1 
5 2 4 6 17 
Answer 
5 =—1 21 —§ —3 1 
TpoTg=|10 —8 4], TgoTp=|—5 —15 =—8 
45 3 25 44 —11 45 
2 6 3 =—1 0 4 
A=|2 0 1), B=] =1 a2 
4 —3 6 —3 8 


3. Let Ty (x1, x2) = (x1 +9, X1 — XQ) and T3(x1, x2) = (3x4, 2x1, + 4x9). 
(a) Find the standard matrices for 7 and 7. 
(b) Find the standard matrices for 73 0 Ty and Tj o F3. 
(c) Use the matrices obtained in part (b) to find formulas for Fy (7'3(x1, x2)) and 73(7y (x4, x2)) . 


Answer: 


Onl} aL 


(b) mot =|% 4 ToTa=| 1 i 


(c) F3(7 1 (x1, %2)) = (3x1 + 3x2, 6x1 — 2x9), 
11 (72(x1, %2)) = (5x1 +4x2, x1 —4x2) 


4. Let Fy (x1, x9, %3) = (4x1, — 2x1 +29, — x1 — 3x9) and 7 9(x1, x9, x3) = (x1 + 2x9, — x3, 4x1 — 2x3). 
(a) Find the standard matrices for 7 and 7. 
(b) Find the standard matrices for 73 0 Ty and Tj 0 73. 


(c) Use the matrices obtained in part (b) to find formulas for Ty (73(x1, x2, x3)) and 73(Ty (x1, x2, x3)). 
5, Find the standard matrix for the stated composition in 22. 

(a) A rotation of 90°, followed by a reflection about the line y = x. 

(b) An orthogonal projection on the y-axis, followed by a contraction with factor k = > 


(c) A reflection about the x-axis, followed by a dilation with factor - — 3. 


Answer: 
(a) |1 9 
0 —1 

(b) |O 90 
g 1 

2 
(c) |3 9 
0 —3 


6. Find the standard matrix for the stated composition in 22. 


(a) A rotation of 60°, followed by an orthogonal projection on the x-axis, followed by a reflection about the 
line y =x. 


(b) A dilation with factor * = 2, followed by a rotation of 45°, followed by a reflection about the y-axis. 
(c) A rotation of 15°, followed by a rotation of 105°, followed by a rotation of 60°. 
7. Find the standard matrix for the stated composition in 22. 


(a) A reflection about the yz-plane, followed by an orthogonal projection on the xz-plane. 
(b) A rotation of 45° about the y-axis, followed by a dilation with factor & = y2- 


(c) An orthogonal projection on the xy-plane, followed by a reflection about the yz-plane. 


Answer: 


(a) |—1 0 0 
000 
001 


(b) [ 1 
0 
1 
(c) | =1 
0 
0 


8. Find the standard matrix for the stated composition in 22. 
(a) A rotation of 30° about the x-axis, followed by a rotation of 30° about the z-axis, followed by a 


contraction with factor & = re 


(b) A reflection about the xy-plane, followed by a reflection about the xz-plane, followed by an orthogonal 
projection on the yz-plane. 


(c) A rotation of 270° about the x-axis, followed by a rotation of 90° about the y-axis, followed by a rotation 
of 180° about the z-axis. 


9. Determine whether 7 o 73 = 730 Tj. 
(a) Ty: R? _, p? is the orthogonal projection on the x-axis, and T2: R? _, p? is the orthogonal projection on 
the y-axis. 


(b) Ty: R? _, p? is the rotation through an angle 91, and T2: R2 _, Rp? is the rotation through an angle 3. 


(c) Ty: R2 _, p? is the orthogonal projection on the x-axis, and T2: R? _, p? is the rotation through an angle 
q. 


Answer: 


(a) TyoTtg=Tr0T| 

(b) TyoTlg=Tr0T| 

(c) Ti 0T2#T20T; 

10. Determine whether 7; o 73 = T30 Tj. 

(a) Ty: RP? _, Ris a dilation by a factor k, and T2: R? _, R? is the rotation about the z-axis through an angle 
@. 

(b) T R? _, R? is the rotation about the x-axis through an angle #1, and T2: R? _, R? is the rotation about 
the z-axis through an angle #3. 


11. By inspection, determine whether the matrix operator is one-to-one. 
(a) the orthogonal projection on the x-axis in 22 
(b) the reflection about the y-axis in R2 
(c) the reflection about the line y = x in R2 
(d) a contraction with factor £ = 0) in R4 
(e) a rotation about the z-axis in R3 
(f) a reflection about the xy-plane in 27 
(g) a dilation with factor ¢ = Q in R? 


12. 


13. 


Answer: 


(a) Not one-to-one 
(b) One-to-one 
(c) One-to-one 
(d) One-to-one 
(e) One-to-one 
(f) One-to-one 
(g) One-to-one 


Find the standard matrix for the matrix operator defined by the equations, and use Theorem 4.10.4 to 
determine whether the operator is one-to-one. 


(a) wi = 8x1 +4x2 
w2=2x%;+ x2 
(b) Wy = 2x1 — 3x2 
w2=5x1+ x2 
(c) Wi = —X1 + 3x2 4+ 2x3 
Wa = 2x1 + 4x3 
w3= x1 + 3x2+ 6x3 
(d) Wi = %1 + 2x2 + 3x3 
Wz = 2x1 + 5x2 + 3x3 
Ww3= Xx + 8x3 
Determine whether the matrix operator 7. R2 _, 24 defined by the equations is one-to-one; if so, find the 
standard matrix for the inverse operator, and find az (w 1; 2}. 
(a) Wt = x1 +2%x2 
w2>—X1+ x2 
(b) W1 = 4x, — 6x2 


W2= — 2x1 + 3x2 
Cid ae 
W2= =x 
(d) wi = 3X1 
w= — 5x, 
Answer: 
(a) I 2 
|? 3) po ey Leen rae Breer ae 
One-to-one; 1 af T (wy, w2) = (Fer 32, 3¥1 | 32) 
3 3 


(b) Not one-to-one 
=1 


(c) Oneoone | ; af Tw, w2)=(—-w2, —wi1) 


(d) Not one-to-one 


14. Determine whether the matrix operator 7: 2? —, R7 defined by the equations is one-to-one; if so, find the 
standard matrix for the inverse operator, and find TT (w 1, W2, v3 
(a) Wi= x1 — 2x24 2x3 
w2=2x,+ x2+ X3 


W3> Xi+ X32 

(b) W1 = x1 — 3x9 +4%3 
Wa= =—X1+ X24 X3 
Wa = = 2x2 + 5x3 


(c) W1= x1+ 4x3 —%x3 
w= 2x, + 7x9 4+-%3 
W3= x1+ 3x2 


(d) W1 = x, +2x2+ 23 
wa= —2xy,+ x24+4%3 
w3 = 7x1 +4x2—5x3 


15. By inspection, find the inverse of the given one-to-one matrix operator. 
(a) The reflection about the x-axis in R2. 
(b) The rotation through an angle of x / 4 in R2. 
(c) The dilation by a factor of 3 in p2. 
(d) The reflection about the yz-plane in 23. 


(¢) The contraction by a factor of : in R3, 


Answer: 


(a) Reflection about the x-axis 


(b) Rotation through the angle — 


4 


(C) Contraction by a factor of ; 


(d) Reflection about the yz-plane 
(e) Dilation by a factor of 5 


In Exercises 16—17, use Theorem 4.10.2 to determine whether 7: 22 _, 22 is a matrix operator. 
16-(a) T(x, y) = (2x, y) 

(b) T(x. = (x7. 

(c) T(x, y) =(-y, x) 

(d) T(x, y) = (, 9) 
1 (a) Tix, y) = (2x +y,2-y) 

(b) T(x, y)= (+1, y¥) 


(ce) Tix y=O.¥) 


8 76.9) = (led) 


Answer: 


(a) Matrix operator 
(b) Not a matrix operator 
(c) Matrix operator 


(d) Not a matrix operator 


In Exercises 18-19, use Theorem 4.10.2 to determine whether 7: 2? _, 22 is a matrix transformation. 


18. (a) T(x, yz) =(x,x+y +2) 
(b) T(x, y,z) = (1, 1) 


19. (a) T(x, y,z) = (0, 0) 
(b) T(x, y,z) = (3x —4y, 2x = 5z) 


Answer: 


(a) Matrix transformation 


(b) Matrix transformation 


20. In each part, use Theorem 4.10.3 to find the standard matrix for the matrix operator from the images of the 
standard basis vectors. 


(a) The reflection operators on 2 in Table | of Section 4.9 . 
(b) The reflection operators on 2? in Table 2 of Section 4.9 . 
(c) The projection operators on 22 in Table 3 of Section 4.9 . 
(d) The projection operators on 2? in Table 4 of Section 4.9 . 
(e) The rotation operators on 22 in Table 5 of Section 4.9 . 


(f) The dilation and contraction operators on 2? in Table 8 of Section 4.9 . 


21. Find the standard matrix for the given matrix operator. 


(a) 7-R4 —, R2 projects a vector orthogonally onto the x-axis and then reflects that vector about the y-axis. 
(b) 7. R2 _, R? reflects a vector about the line y = x and then reflects that vector about the x-axis. 


(c) 7-R2 _, R2 dilates a vector by a factor of 3, then reflects that vector about the line y = x, and then 
projects that vector orthogonally onto the y-axis. 


Answer: 


22. 


23. 


24. 


25. 


(oy { 01 
-1 0 
(c) |9 O 
3 0 
Find the standard matrix for the given matrix operator. 


(a) 7. p3_, R? reflects a vector about the xz-plane and then contracts that vector by a factor of 7 


(b) 7-R3 _, R? projects a vector orthogonally onto the xz-plane and then projects that vector orthogonally 


onto the xy-plane. 


(c) 7-R2_, R? reflects a vector about the xy-plane, then reflects that vector about the xz-plane, and then 


reflects that vector about the yz-plane. 


Let T 4 R? _, R? be multiplication by 
-1 3 90 
A=/| 21 2 
45 =3 


and let €1, @2, and €3 be the standard basis vectors for 22. Find the following vectors by inspection. 
(a) T4(ei), 7 alez), and T 4le3) 

(b) Fale +e2 + 3) 

(c) Pa(le3) 


Answer: 


(a) Tater) =(- 1, 2,4), Talez) = (3, 1,5), Tale3) = (0, 2, — 3) 
(b) Fale + e2 + €3) = (2, 5, 6) 
(c) Ta(7e3) = (0, 14, —21) 


Determine whether multiplication by A is a one-to-one matrix transformation. 
(a) 1 =1 
A=/2 0 
3 =—4 
(b) a ie 
-1 0 =4 
(c) ‘oe ae | 
0 1 1 
A= 
Pa 0 
10 =1 


(a) Is a composition of one-to-one matrix transformations one-to-one? Justify your conclusion. 


(b) Can the composition of a one-to-one matrix transformation and a matrix transformation that is not 
one-to-one be one-to-one? Account for both possible orders of composition and justify your conclusion. 


Answer: 


(a) Yes 
(b) Yes 


26. Show that T(x, y) = (0, 0) defines a matrix operator on R? but T(x, ») = (1, 1) does not. 


27. (a) Prove: If 7:2" — 2” is a matrix transformation, then 7(0}) =0; that is, T maps the zero vector in 2” 
into the zero vector in R™. 


(b) The converse of this is not true. Find an example of a function that satisfies 7(0}) = 0 but is not a matrix 
transformation. 


Answer: 
(b) T(x1, x2) = (x? | x. x12} 


28. Prove: An »2 5 3 matrix A is invertible if and only if the linear system 4x — y has exactly one solution for 
every vector y in 8” for which the system is consistent. 


29. Let A be an » x » matrix such that det(A) =0, and let 7: R” — R” be multiplication by A. 
(a) What can you say about the range of the matrix T? Give an example that illustrates your conclusion. 


(b) What can you say about the number of vectors that T maps into Q? 
Answer: 


(a) The range of T is a proper subset of 2”. 


(b) T must map infinitely many vectors to 0. 
30. Prove: If the matrix transformation 7 4: 2” —. 2” is one-to-one, then A is invertible. 


True-False Exercises 


In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 
(a) If 7:2” — R™ and T(0) =O, then 7 is a matrix transformation. 
Answer: 


False 


(b) If 7:2" — R™ and T(eyx + cay) =c1, T(x) +27 (y) for all scalars cy and cz and all vectors x and ¥ in R” 
, then 7 is a matrix transformation. 


Answer: 


True 


(c) If 7:2" — R™ is a one-to-one matrix transformation, then there are no distinct vectors x and ¥ for which 


Tix—y) =0. 
Answer: 


True 


(d) If 7:2" — R™ is a matrix transformation and jy; = », then T is one-to-one. 
Answer: 


False 


(e) If 7:2" — R”™ is a matrix transformation and »; — », then T is one-to-one. 
Answer: 


False 


(f) If 7:2” —. R™ is a matrix transformation and }; < », then T is one-to-one. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


4.11 Geometry of Matrix Operators on R? 


In this optional section we will discuss matrix operators on 2? ina little more depth. The ideas that we will develop here 
have important applications to computer graphics. 


Transformations of Regions 


In Section 4.9 we focused on the effect that a matrix operator has on individual vectors in R2 and 7. However, it is also 


important to understand how such operators affect the shapes of regions. For example, Figure 4.11.1 shows a famous 
picture of Albert Einstein and three computer-generated modifications of that image that result from matrix operators on 
R2. The original picture was scanned and then digitized to decompose it into a rectangular array of pixels. The pixels 


were then transformed as follows: 
e The program MATLAB was used to assign coordinates and a gray level to each pixel. 
e The coordinates of the pixels were transformed by matrix multiplication. 


¢ The pixels were then assigned their original gray levels to produce the transformed picture. 


Rotated | Sheared horizontally Compressed horizc 


] 


Figure 4.11.1 


The overall effect of a matrix operator on 2? can often be ascertained by graphing the images of the vertices 

(0, 0), (1, 0), (0, 1), and (1, 1) of the unit square (Figure 4.11.2). Table 1 shows the effect that some of the matrix 
operators studied in Section 4.9 have on the unit square. For clarity, we have shaded a portion of the original square and 
its corresponding image. 


e 


Unit square Unit square rotated Unit square reflected 
; — about the y-axis 


| Unit square 
| onto the x-i 


Unit square reflected 
about the line y=x 


Figure 4.11.2 


Table 1 


Standard Matrix Effect on the Unit Square 


Reflection about 
the y-axis 


Reflection about 
the x-axis 


Reflection about 
the line y =x 


Counterclockwise 
rotation through 
an angle 0 


Compression in the 
x-direction by a 
factor of k 


(0<k<1) 


Expansion in the 
x-direction by a 
factor of k 


(k>1) 


, (x + ky. y) 
Shear in the 


x-direction with 
factor k > 0 


y 
| (x + ky, y) 


EXAMPLE 1 Transforming with Diagonal Matrices 


Suppose that the xy-plane first is compressed or expanded by a factor of & in the x-direction and then is 
compressed or expanded by a factor of {3 in the y-direction. Find a single matrix operator that performs 
Solution The standard matrices for the two operations are 


both operations. 
ky, O 1 0 
01 0 k2 


x-compression (expansion) y-compression (expansion) 


Thus, the standard matrix for the composition of the x-operation followed by the y-operation is 


_f1 O]fk, Oo] Jar 9 
(alle [3 a ° 


This shows that multiplication by a diagonal 2 5 2 matrix compresses or expands the plane in the 
x-direction and also in the y-direction. In the special case where &; and x are the same, say ky = kz =k, 


Formula 1 simplifies to 
& 0 
A= 
[> | 


which is a contraction or a dilation (Table 7 of Section 4.9 ). 


EXAMPLE 2 Finding Matrix Operators 


(a) Find the standard matrix for the operator on 22 that first shears by a factor of 2 in the x-direction and 
then reflects the result about the line y = x. Sketch the image of the unit square under this operator. 


(b) Find the standard matrix for the operator on ?2 that first reflects about y = x and then shears by a 
factor of 2 in the x-direction. Sketch the image of the unit square under this operator. 


(c) Confirm that the shear and the reflection in parts (a) and (b) do not commute. 


Solution 


(a) The standard matrix for the shear is 
and for the reflection is 


Thus, the standard matrix for the shear followed by the reflection is 


wa-($ fo iI-[ al 


(b) The standard matrix for the reflection followed by the shear is 


aaeely ills alli ol 


(c) The computations in Solutions (a) and (b) show that A;.A3 # Az], so the standard matrices, and 
hence the operators, do not commute. The same conclusion follows from Figures 4.11.3 and 4.11.4, 
since the two operators produce different images of the unit square. 


(1.1) 


Reflection | Shear in the 
about y =x | x-direction 
| with k=2 
Figure 4.11.3 


Shear in the Reflection 


x-direction about y=x 
withk=2 . : 
Figure 4.11.4 


Geometry of One-to-One Matrix Operators 


We will now turn our attention to one-to-one matrix operators on 22, which are important because they map distinct 


points into distinct points. Recall from Theorem 4.10.4 (the Equivalence Theorem) that a matrix transformation 7’ 4 is 
one-to-one if and only if A can be expressed as a product of elementary matrices. Thus, we can analyze the effect of any 
one-to-one transformation 7” 4 by first factoring the matrix A into a product of elementary matrices, say 


A= #,#3..#, 


and then expressing 74 as the composition 


Ta=T3)82..8,=Tg,0Tz,0...0Tz, (2) 


The following theorem explains the geometric effect of matrix operators corresponding to elementary matrices. 


THEOREM 4.11.1 


If £ is an elementary matrtix, then 75. R2 _, p2 is one of the following: 
(a) Ashear along a coordinate axis. 

(b) Areflection about y = x. 

(c) Acompression along a coordinate axis. 

(da) An expansion along a coordinate axis. 

(e) Areflection about a coordinate axis. 


(f) Acompression or expansion along a coordinate axis followed by a reflection about a coordinate axis. 


Proof Because a? x 2 elementary matrix results from performing a single elementary row operation on the ? x 2 
identity matrix, such a matrix must have one of the following forms (verify): 


1 0 1k 0 1 & 0 1 0 
k1y jo 17 10/7 j|O 17 |O & 
The first two matrices represent shears along coordinate axes, and the third represents a reflection about y = x. Ifk = 0, 


the last two matrices represent compressions or expansions along coordinate axes, depending on whether 0 < * < 1 or 
kc > 1. If = 0, and if we express & in the form = — k1, where k, > 0, then the last two matrices can be written as 


fe a=] 0" sf-L allt 4] ® 
E el=|[2 “2 \=[0 ale a (4) 


Since £1 > 0, the product in 3 represents a compression or expansion along the x-axis followed by a reflection about the 


y-axis, and 4 represents a compression or expansion along the y-axis followed by a reflection about the x-axis. In the 
case where { — ~— ], transformations 3 and 4 are simply reflections about the y-axis and x-axis, respectively. 


Since every invertible matrix is a product of elementary matrices, the following result follows from Theorem 4.11.1 and 
Formula 2. 


THEOREM 4.11.2 


If 4: R2 _, p2 is multiplication by an invertible matrix A, then the geometric effect of T 4 is the same as an 
appropriate succession of shears, compressions, expansions, and reflections. 


EXAMPLE 3 Analyzing the Geometric Effect of a Matrix Operator 


Assuming that £1 and 3 are positive, express the diagonal matrix 


A ky, 0 
10 ke 
as a product of elementary matrices, and describe the geometric effect of multiplication by A in terms of 
compressions and expansions. 


Solution From Example | we have 


[0 allo alle 1] 


which shows that multiplication by A has the geometric effect of compressing or expanding by a factor of 
i, in the x-direction and then compressing or expanding by a factor of £3 in the y-direction. 


EXAMPLE 4 Analyzing the Geometric Effect of a Matrix Operator 


1 2 
A= 
3 4 
as a product of elementary matrices, and then describe the geometric effect of multiplication by A in terms 
of shears, compressions, expansions, and reflections. 


Express 


Solution A can be reduced to / as follows: 
2 | E 2 | |: | k | 


3 4 0 —2 0 1 0 1 
1 1 1 
Rid Fiennes Mune Add =? times 
the first row second row the second row 
to the second. by 5 ; to the first. 


The three successive row operations can be performed by multiplying 4 on the left successively by 


1 0 
1 0 1 =2 
a=[_} ‘lt =!) 1) =| ql 
2 
Inverting these matrices and using Formula 4 of Section 1.5 yields 


-1 p-1 pl 1 Of} 1 O}/1 2 
= Bo’ R°Cc= 
sla Has ae E ‘ll alls A 


Reading from right to left and noting that 


|. | = k HI | 
0 =2 0 =1]/0 2 
it follows that the effect of multiplying by A is equivalent to 

]. shearing by a factor of 2 in the x-direction, 

. then expanding by a factor of 2 in the y-direction, 


2 
3. then reflecting about the x-axis, 
4 


. then shearing by a factor of 3 in the y-direction. 


Images of Lines Under Matrix Operators 


Many images in computer graphics are constructed by connecting points with line segments. The following theorem, 
some of whose parts are proved in the exercises, is helpful for understanding how matrix operators transform such 
figures. 


THEOREM 4.11.3 


If 7. R2 _, R2 is multiplication by an invertible matrix, then: 

(a) The image of a straight line is a straight line. 

(b) The image of a straight line through the origin is a straight line through the origin. 

(c) The images of parallel straight lines are parallel straight lines. 

(d) The image of the line segment joining points P and Q is the line segment joining the images of P and Q. 


(e) The images of three points lie on a line if and only if the points themselves lie on a line. 


Note that it follows from Theorem 4.11.3 that if A is 
an invertible 2 s¢ 2 matrix, then multiplication by A 
maps triangles into triangles and parallelograms into 
parallelograms. 


EXAMPLE 5 ImageofaSquare @ 


Sketch the image of the square with vertices (0, 0}, (1, 1}, and (0, 1) under multiplication by 


are) 


-—1 2/0] _ |0 —1 2}/1}_]-1 
2 -1][0] [o]’ 2 -1]}[0] [| 2 
—1 2)/0)/_] 2 —1  2y/1)_]1 
2 =14}/1 -1] 2 =1]/1 1 
the image of the square is a parallelogram with vertices (0, 0), (— 1, 2), (2, — 1), and (1, 1) (Figure 
4.11.5). 


Solution Since 


(0, 1) 


(0, 0) 


(0, 0) 


Figure 4.11.5 


EXAMPLE 6 ImageofaLine «@ 
According to Theorem 4.11.3, the invertible matrix 
3 1 
A= 
1] 
maps the line y = 2x 4. 1 into another line. Find its equation. 


Solution Let (x, y) bea point on the line y = 2x 4. 1, and let (x', y ') be its image under 
multiplication by A. Then 


PE bl 6I-E TE fe 3b] 


sO 
iors x! = y' 
yo =—2x' + By! 
Substituting in y = 2x 4 1 yields 
=—2x' + 3y' =2(x' —»') +1 or equivalently »' = 3s" +3 


Thus (x ', y ' ) satisfies 


which is the equation we want. 


Concept Review 
e Effect of a matrix operator on the unit square 
e Geometry of one-to-one matrix operators 


e Images of lines under matrix operators 


Skills 


* Find standard matrices for geometric transformations of 22. 


¢ Describe the geometric effect of an invertible matrix operator. 
e Find the image of the unit square under a matrix operator. 


e Find the image of a line under a matrix operator. 


Exercise Set 4.11 


1. Find the standard matrix for the operator 7. R2 _, R2 that maps a point (x, y) into 


(a) its reflection about the line y= — x. 
(b) its reflection through the origin. 
(c) its orthogonal projection on the x-axis. 


(d) its orthogonal projection on the y-axis. 


Answer: 


(a) |} O =1 
-1 0 


(b) f—1 0 
0-1 


(a) [0 0 
01 


. For each part of Exercise 1, use the matrix you have obtained to compute 72, 1). Check your answers geometrically 
by plotting the points (2, 1) and 7(2, 1). 


N 


ims) 


. Find the standard matrix for the operator 7: 27 _, 2? that maps a point (x, y, z) into 


(a) its reflection through the xy-plane. 
(b) its reflection through the xz-plane. 
(c) its reflection through the yz-plane. 


Answer: 


1 0 
0 1 
00 =1 


» 


nn 


n 


I 


oe 


(b)}/1 9 0 
0 =-1 0 
0 601 
(c) | —1 0 0 
ae a 
001 

. For each part of Exercise 3, use the matrix you have obtained to compute 7(1, 1, 1). Check your answers 


geometrically by plotting the points (1, 1, 1) and 7(1, 1, 1). 


. Find the standard matrix for the operator 7: 27 —, R3 that 


(a) rotates each vector 90° counterclockwise about the z-axis (looking along the positive z-axis toward the origin). 
(b) rotates each vector 90° counterclockwise about the x-axis (looking along the positive x-axis toward the origin). 


(c) rotates each vector 90° counterclockwise about the y-axis (looking along the positive y-axis toward the origin). 


Answer: 


oo 


(a) 


0 
1 
0 
(b) | 1 
0 
0 


(c) 


HOO 
I 
Cor OP OH OS 


or © 


. Sketch the image of the rectangle with vertices (0, 0), (1, 0), (1, 2), and (0, 2) under 


(a) areflection about the x-axis. 


(b) areflection about the y-axis. 


(C) acompression of factor k = ; in the y-direction. 


(d) an expansion of factor * — 2 in the x-direction. 
(e) ashear of factor £ — 3 in the x-direction. 


(f) ashear of factor { — 3 in the y-direction. 


. Sketch the image of the square with vertices (0, 0), (1, 0), (0, 1), and (1, 1) under multiplication by 


fi 


Rectangle with vertices at (0, 0), (—3, 0}, (0,1), (—3, 1) 


Answer: 


. Find the matrix that rotates a point (x, y) about the origin 


(a) 45° 
(b) 90° 
(c) 180° 
(d) 270° 


(e) —30° 
9. Find the matrix that shears by 


(a) a factor of * — 4 in the y-direction. 


(b) a factor of & = — 2 in the x-direction. 


Answer: 


10. Find the matrix that compresses or expands by 


(a) a factor of 4 in the y-direction. 


3 


(b) a factor of 6 in the x-direction. 


11. In each part, describe the geometric effect of multiplication by A. 


(a) = fe | 


0 1 
(b) ,__| 1 0 
a-(o 5] 
(e} gn [4-4 
(0 4] 


Answer: 


(a) Expansion by a factor of 3 in the x-direction 
(b) Expansion by a factor of 5 in the y-direction and reflection about the x-axis 
(c) Shearing by a factor of 4 in the x-direction 


12. In each part, express the matrix as a product of elementary matrices, and then describe the effect of multiplication by 
A in terms of compressions, expansions, reflections, and shears. 


ale 
acl 
onl 
ea[h 3 


13. In each part, find a single matrix that performs the indicated succession of operations. 


(a) Compresses by a factor of > in the x-direction, then expands by a factor of 5 in the y-direction. 


(b) Expands by a factor of 5 in the y-direction, then shears by a factor of 2 in the y-direction. 
(c) Reflects about y = x, then rotates through an angle of 180° about the origin. 


Answer: 


14. In each part, find a single matrix that performs the indicated succession of operations. 
(a) Reflects about the y-axis, then expands by a factor of 5 in the x-direction, and then reflects about y = x. 
(b) Rotates through 30° about the origin, then shears by a factor of —2 in the y-direction, and then expands by a 
factor of 3 in the y-direction. 
15. Use matrix inversion to show the following. 
(a) The inverse transformation for a reflection about y = x is a reflection about y = x. 
(b) The inverse transformation for a compression along an axis is an expansion along that axis. 
(c) The inverse transformation for a reflection about a coordinate axis is a reflection about that axis. 


(d) The inverse transformation for a shear along a coordinate axis is a shear along that axis. 
16. Find an equation of the image of the line » = — 4x 4. 3 under multiplication by 
4 =3 
A= 
17. In parts (a) through (e), find an equation of the image of the line »» — 2x under 
(a) ashear of factor 3 in the x-direction. 


(b) a compression of factor 5 in the y-direction. 


(c) areflection about y = x. 
(d) a reflection about the y-axis. 


(e) arotation of 60° about the origin. 


Answer: 

(a) y= Sx 

(b) Y= 

(c) yaor 

(d) y= —2x 

(e) ua - 8 aE ; 


18. Find the matrix for a shear in the x-direction that transforms the triangle with vertices (0, 0}, (2, 1), and (3, 0) into 
a right triangle with the right angle at the origin. 


19. (a) Show that multiplication by 


maps each point in the plane onto the line » = 2x. 


20. 


21. 
22. 


23. 


(b) It follows from part (a) that the noncollinear points (1, 0}, (0, 1), ( — 1, 0) are mapped onto a line. Does this 
violate part (e) of Theorem 4.11.3? 


Answer: 


(b) No 


Prove part (a) of Theorem 4.11.3. [Hint: A line in the plane has an equation of the form Ax -+-Sy + C= 0, where A 
and B are not both zero. Use the method of Example 6 to show that the image of this line under multiplication by the 


invertible matrix 
Fa 
ca 
has the equation 4' x 4+ B' y + C’=0, where 
A' = (dA=—c3)! (ad —be) 
and 
B' =(=—6A+a8)! (ad — be) 
Then show that 4' and 8' are not both zero to conclude that the image is a line. ] 
Use the hint in Exercise 20 to prove parts (b) and (c) of Theorem 4.11.3. 


In each part of the accompanying figure, find the standard matrix for the operator described. 


(a) (b) {c) 
Figure Ex-22 


In 22 the shear in the xy-direction with factor k is the matrix transformation that moves each point (x, y, z) parallel 
to the xy-plane to the new position (x 4. kz, y ++ kz, z). (See the accompanying figure.) 
(a) Find the standard matrix for the shear in the xy-direction with factor k. 


(b) How would you define the shear in the xz-direction with factor k and the shear in the yz-direction with factor k? 


Find the standard matrices for these matrix transformations. 


Figure Ex-23 


Answer: 


(b) Shear in the xz-direction with 


factor k maps (x, y, z) to (x + ky, y,z+ ky): 


oo 
a 
r- OO 


or 2 
— © © 


1 
Shear in the yz-direction with factor & maps (x, y, z) to (x, y kx, z+ kx): | & 
K 


True-False Exercises 

In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) The image of the unit square under a one-to-one matrix operator is a square. 
Answer: 


False 
(b) A 2 x 2 invertible matrix operator has the geometric effect of a succession of shears, compressions, expansions, and 
reflections. 


Answer: 


True 


(c) The image of a line under a one-to-one matrix operator is a line. 
Answer: 


True 


(d) Every reflection operator on 22 is its own inverse. 


Answer: 


True 


(©) The matrix : ; | represents reflection about a line. 


Answer: 
False 


@ The matrix E zi represents a shear. 


Answer: 


False 


(g) The matrix E | represents an expansion. 


Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


4.12 Dynamical Systems and Markov Chains 


In this optional section we will show how matrix methods can be used to analyze the behavior of physical systems that 
evolve over time. The methods that we will study here have been applied to problems in business, ecology, 
demographics, sociology, and most of the physical sciences. 


Dynamical Systems 


A dynamical system is a finite set of variables whose values change with time. The value of a variable at a point in time 
is called the state of the variable at that time, and the vector formed from these states is called the state of the 
dynamical system at that time. Our primary objective in this section is to analyze how the state of a dynamical system 
changes with time. Let us begin with an example. 


EXAMPLE 1 Market Share asa Dynamical System << 


Suppose that two competing television channels, channel 1 and channel 2, each have 50% of the viewer 
market at some initial point in time. Assume that over each one-year period channel | captures 10% of 
channel 2's share, and channel 2 captures 20% of channel 1's share (see Figure 4.12.1). What is each 
channel's market share after one year? 


80% 90% 


Channel | loses 20% and 
holds 80%. 
Channel 2 loses 10% and 


holds 90%. 


Figure 4.12.1 


Solution Let us begin by introducing the time-dependent variables 
x4(£) = fraction of the market held by channel 1 at time £ 
x3(é) =fraction of the market held by channel 2 at tyme ¢ 


and the column vector 


«— Channel 2's fraction of the market at time # in years 


w * DI «— Channel 1's fraction of the market at time ¢ in years 
xlé = 
x(t) 


The variables x4 (£) and x3(£) form a dynamical system whose state at time ¢ is the vector x(¢)}. If we 
take ¢ = Q to be the starting point at which the two channels had 50% of the market, then the state of the 
system at that time is 

x1(0) _ | 0.5 | — Channel 1's fraction of the market at times = 0 
x2(0) 0.5 | — Channel 2's fraction of the market at time ¢ = 0 


Now let us try to find the state of the system at time ¢ — | (one year later). Over the one-year period, 
channel | retains 80% of its initial 50%, and it gains 10% of channel 2's initial 50%. Thus, 


x(0) = (1) 


x1 (1) =0.8(0.5) + 0.1(0.5) =0.45 (2) 
Similarly, channel 2 gains 20% of channel 1's initial 50%, and retains 90% of its initial 50%. Thus, 
x2(1) =0.2(0.5) + 0.9(0.5) =0.55 (3) 


Therefore, the state of the system at time ¢ — ] is 


xl1/= x1(1) __ [0.45] — Channel 1's fraction of the market at time ¢= 1 
x2(1) 0.55 | — Channel 2's fraction of the market at time ¢= 1 


EXAMPLE 2 Evolution of Market Share over Five Years 


Track the market shares of channels 1 and 2 in Example | over a five-year period. 


Solution To solve this problem suppose that we have already computed the market share of each 
channel at time ¢ — x and we are interested in using the known values of xj (4) and x3(4) to compute the 
market shares xj (++ 1) and x2(k + 1) one year later. The analysis is exactly the same as that used to 
obtain Equations 2 and 3. Over the one-year period, channel | retains 80% of its starting fraction x 4 (+) 
and gains 10% of channel 2's starting fraction x3(4). Thus, 


xy (e+ 1) = (0.8)x4 (4) + (0.1) x2(k) (5) 


Similarly, channel 2 gains 20% of channel 1's starting fraction x; (4) and retains 90% of its own starting 
fraction x2(4). Thus, 


x(k 1) = (0.2)x1 (4) + (0.9) x9(k) (6) 
Equations 5 and 6 can be expressed in matrix form as 
mik+1)) fog 0.17/71) 
xo(ke+1)}~ (0.2 0.9] xatk) ) 
which provides a way of using matrix multiplication to compute the state of the system at time ¢ =k + 1 
from the state at time ¢ = &. For example, using | and 7 we obtain 


0.8 0.1 0.8 0.1]} 0.5 0.45 
xt) =| 95 19/2 =[99 AAs 
which agrees with 4. Similarly, 
0.8 0.1 0.8 0.1 |} 0.45 0.415 
x2 =| 95 19/2 =99 pile lei 
We can now continue this process, using Formula 7 to compute x(3) from x(2), then x(4) from x(3), 
and so on. This yields (verify) 


0.3905 0.37335 0.361345 
= — 5 — 
a) bee a bee ~— hee 8) 


Thus, after five years, channel 1 will hold about 36% of the market and channel 2 will hold about 64%of 
the market. 


If desired, we can continue the market analysis in the last example beyond the five-year period and explore what 
happens to the market share over the long term. We did so, using a computer, and obtained the following state vectors 
(rounded to six decimal places): 


0.338041 0.333466 0.333333 
10) ww 20) ey 40) = 
me bese men beseo a ee | ©) 


All subsequent state vectors, when rounded to six decimal places, are the same as x(40)}, so we see that the market 
shares eventually stabilize with channel | holding about one-third of the market and channel 2 holding about 
two-thirds. Later in this section, we will explain why this stabilization occurs. 


Markov Chains 


In many dynamical systems the states of the variables are not known with certainty but can be expressed as 
probabilities; such dynamical systems are called stochastic processes (from the Greek word stokastikos, meaning 
“proceeding by guesswork’’). A detailed study of stochastic processes requires a precise definition of the term 
probability, which is outside the scope of this course. However, the following interpretation will suffice for our present 
purposes: 


Stated informally, the probability that an experiment or observation will have a certain outcome is 
approximately the fraction of the time that the outcome would occur if the experiment were to be repeated many 
times under constant conditions—the greater the number of repetitions, the more accurately the probability 
describes the fraction of occurrences. 


For example, when we say that the probability of tossing heads with a fair coin is > we mean that if the coin were 


tossed many times under constant conditions, then we would expect about half of the outcomes to be heads. 
Probabilities are often expressed as decimals or percentages. Thus, the probability of tossing heads with a fair coin can 
also be expressed as 0.5 or 50%. 


If an experiment or observation has 1 possible outcomes, then the probabilities of those outcomes must be nonnegative 
fractions whose sum is 1. The probabilities are nonnegative because each describes the fraction of occurrences of an 
outcome over the long term, and the sum is | because they account for all possible outcomes. For example, if a box 
containing 10 balls has one red ball, three green balls, and six yellow balls, and if a ball is drawn at random from the 
box, then the probabilities of the various outcomes are 


Pi =prob(red) = 1/10=0.1 
pz =prob(green) = 3/10=0.3 
p3=prob(yellow) = 6/10=0.6 
Each probability is a nonnegative fraction and 
Pitpa+ p3=014+03+4+06=1 


In a stochastic process with n possible states, the state vector at each time ¢ has the form 


x4(é) |Probability that the system is in state 1 


<= x(t) |Probability that the system is in state 2 


x» (£) | Probability that the system is in state 7 


The entries in this vector must add up to | since they account for all 7 possibilities. In general, a vector with 
nonnegative entries that add up to 1 is called a probability vector. 


EXAMPLE 3 Example 1 Revisited from the Probability Viewpoint 


Observe that the state vectors in Example | and Example 2 are all probability vectors. This is to be 
expected since the entries in each state vector are the fractional market shares of the channels, and together 
they account for the entire market. In practice, it is preferable to interpret the entries in the state vectors as 
probabilities rather than exact market fractions, since market information is usually obtained by statistical 
sampling procedures with intrinsic uncertainties. Thus, for example, the state vector 


x41 
r(1) = 11) | _ [0.45 
x2(1) 0.55 
which we interpreted in Example 1 to mean that channel | has 45% of the market and channel 2 has 55%, 


can also be interpreted to mean that an individual picked at random from the market will be a channel 1 
viewer with probability 0.45 and a channel 2 viewer with probability 0.55. 


A square matrix, each of whose columns is a probability vector, is called a stochastic matrix. Such matrices commonly 
occur in formulas that relate successive states of a stochastic process. For example, the state vectors x(* + 1) and x(k) 
in 7 are related by an equation of the form x(k + 1) = Px(k) in which 


fos o1 
P=(95 “t oy 


is a stochastic matrix. It should not be surprising that the column vectors of P are probability vectors, since the entries 
in each column provide a breakdown of what happens to each channel's market share over the year—the entries in 
column | convey that each year channel | retains 80% of its market share and loses 20%; and the entries in column 2 
convey that each year channel 2 retains 90% of its market share and loses 10%. The entries in 10 can also be viewed as 


probabilities: 


Pi, =0.8=probability that a channel 1 wewer remains a channel 1 viewer 
Pz, =0.2=probabdlty that a channel 1 wewer becomes a channel 2 viewer 
P12 =0.1=probabulity that a channel 2 wewer becomes a channel 1 wewer 
P72) =0.9=probabilty that a channel 2 viewer remams a channel 2 viewer 


Example | is a special case of a large class of stochastic processes, called Markov chains. 


Andrei Andreyevich Markov (1856-1922) 


Historical Note Markov chains are named in honor of the Russian mathematician A. A. Markov, a lover of 
poetry, who used them to analyze the alternation of vowels and consonants in the poem Eugene Onegin by 
Pushkin. Markov believed that the only applications of his chains were to the analysis of literary works, so he 
would be astonished to learn that his discovery is used today in the social sciences, quantum theory, and 
genetics! 

[Image: wikipedia] 


DEFINITION 1 


A Markov chain is a dynamical system whose state vectors at a succession of time intervals are probability 
vectors and for which the state vectors at successive time intervals are related by an equation of the form 


x(k + 1) = Pxtk) 
in which P = [ pj;] is a stochastic matrix and Py is the probability that the system will be in state i at time 
f—k-+ 1 ifit is in state / at time ¢ = ¢. The matrix P is called the transition matrix for the system. 


Remark Note that in this definition the row index i corresponds to the later state and the column index j to the earlier 
state (Figure 4.12.2). 


State at time t= 


State at lime 


Py take 


The entry p,; is the probability 
that the system is in state i at 
time t=k+1 if it is in state j 
at time r=k. 


Figure 4.12.2 


EXAMPLE 4 Wildlife Migration as a Markov Chain << 


Suppose that a tagged lion can migrate over three adjacent game reserves in search of food, reserve 1, 
reserve 2, and reserve 3. Based on data about the food resources, researchers conclude that the monthly 
migration pattern of the lion can be modeled by a Markov chain with transition matrix 


Reserve at time = * 


1 2 3 
0.5 04 06] 1 
Pp = {02 02 03] 2 Reserve attmet=—k+ 1 
0.3 04 0.1] 3 


(see Figure 4.12.3). That is, 
Pi, =0.5=probability that the hon will stay in reserve 1 when itis in reserve 1 
P12 =04= probability that the hon will move from reserve 2 to reserve 1 
P13 = 0.6 = probability that the Lon will move from reserve 3 to reserve 1 
Pz, =0.2= probability that the hon will move from reserve 1 to reserve 2 
py, =0.2=probability that the hon will stay in reserve 2 when tt ts in reserve 2 
P73. = 0.3 = probability that the lon will move from reserve 3 to reserve 2 
p3, = 0.3 = probability that the lon will move from reserve 1 to reserve 3 
p32) =04= probability that the hon will move from reserve 2 to reserve 3 
p33) =0.1=probability that the hon will stay in reserve 3 when tt is in reserve 3 


Assuming that ¢ is in months and the lion is released in reserve 2 at time ¢ — Qj, track its probable 
locations over a six-month period. 


0.5 


(—_ 


Reserve 
0.2 03 
fos oN 


(tre besa “|e 
“O4 4 


Figure 4.12.3 


Solution Let x(k), x3(k), and x3(%) be the probabilities that the lion is in reserve 1, 2, or 3, 
respectively, at time ¢ — &, and let 
x1(k) 
x(k) = | x2(%) 
x3(k) 


be the state vector at that time. Since we know with certainty that the lion is in reserve 2 at time ¢ — Qj, the 
initial state vector is 
0 
x(0)=/1 
0 


We leave it for you to show that the state vectors over a six-month period are 


0.400 0.520 0.500 
x(1) =Px(0) =| 0.200 |, x(2) = Px(1) =| 0.240 |, x(3) = Px(2) =| 0.224 
0.400 0.240 0.276 
0.505 0.504 0.504 
x(4) =Px(3) & | 0.228 |, x(5) = Px(4) & | 0.227 |, x(6) = Px(5) | 0.227 
0.267 0.269 0.269 


As in Example 2, the state vectors here seem to stabilize over time with a probability of approximately 
0.504 that the lion is in reserve 1, a probability of approximately 0.227 that it is in reserve 2, and a 
probability of approximately 0.269 that it is in reserve 3. 


Markov Chains in Terms of Powers of the Transition Matrix 


In a Markov chain with an initial state of x(0)}, the successive state vectors are 
x(1) = Px(0), x(2) = Px(1), x(3) = Px(2), x(4) = Px(3), ... 
For brevity, it is common to denote x() by X;,, which allows us to write the successive state vectors more briefly as 
xy = Px, x2 = Pxi, x3 = Px, x4g= Px3,... (11) 
Note that Formula 12 makes it possible to compute 
the state vector X;, without first computing the 
earlier state vectors as required in Formula 11. 


Alternatively, these state vectors can be expressed in terms of the initial state vector Xg as 
x, = Pxp, x7 = P(Pxo| = P*x,, x3 = P(P*xo] = P3x5, x4= P(P*xo| = Péxy, 


from which it follows that 


xj, = P*xy (12) 


EXAMPLE 5 Finding a State Vector Directly from xo 


Use Formula 12 to find the state vector x(3) in Example 2. 


Solution From 1 and 7, the initial state vector and transition matrix are 
0.5 0.8 0.1 
=(0)=[5 = P=[0 a 
We leave it for you to calculate P? and show that 
e138 ae Pe 0.562 0.219}/0.5} | 0.3905 
~ aes “010.438 0.781} [0.5] | 0.6095 


which agrees with the result in 8. 


Long-Term Behavior of a Markov Chain 


We have seen two examples of Markov chains in which the state vectors seem to stabilize after a period of time. Thus, 
it is reasonable to ask whether all Markov chains have this property. The following example shows that this is not the 
case. 


EXAMPLE 6 A Markov Chain That Does Not Stabilize <4 


[ 


is stochastic and hence can be regarded as the transition matrix for a Markov chain. A simple calculation 
shows that Pp? — ;, from which it follows that 


f=P?aPt =P =. and P =P? =P? =P’ =... 


Thus, the successive states in the Markov chain with initial vector Xp are 


The matrix 


xg, Pxg, xo, Pxp, Xp, --- 
which oscillate between xg and Pxg. Thus, the Markov chain does not stabilize unless both components 
of Xg are > (verify). 


A precise definition of what it means for a sequence of numbers or vectors to stabilize is given in calculus; however, 
that level of precision will not be needed here. Stated informally, we will say that a sequence of vectors 


Xj, X3,---, Xi, --- 


approaches a limit 4 or that it converges to 4 if all entries in X; can be made as close as we like to the corresponding 
entries in the vector 4 by taking & sufficiently large. We denote this by writing Xi — 4 as k — oo. 


We saw in Example 6 that the state vectors of a Markov chain need not approach a limit in all cases. However, by 


imposing a mild condition on the transition matrix of a Markov chain, we can guarantee that the state vectors will 
approach a limit. 


DEFINITION 2 


A stochastic matrix P is said to be regular if P or some positive power of P has all positive entries, and a 
Markov chain whose transition matrix is regular is said to be a regular Markov chain. 


EXAMPLE 7 Regular Stochastic Matrices 


The transition matrices in Example 2 and Example 4 are regular because their entries are positive. The 


matrix 
0.5 1 
P=(93 1 
is regular because 
2_ (0.75 0.5 
~ 10.25 0.5 


has positive entries. The matrix P in Example 6 is not regular because P and every positive power of P 
have some zero entries (verify). 


The following theorem, which we state without proof, is the fundamental result about the long-term behavior of 
Markov chains. 


THEOREM 4.12.1 


If P is the transition matrix for a regular Markov chain, then: 

(a) There is a unique probability vector 4 such that Pq = q. 

(b) For any initial probability vector xg, the sequence of state vectors 
xq, PXxp, -.., P¥xp, = 


converges to q. 


The vector 4 in this theorem is called the steady-state vector of the Markov chain. It can be found by rewriting the 
equation in part (a) as 


=—P)q=0 


and then solving this equation for 4 subject to the requirement that q be a probability vector. Here are some examples. 


EXAMPLE 7 Example 1 and Example 2 Revisited 


The transition matrix for the Markov chain in Example 2 is 
0.8 0.1 
P= 
0.2 0.9 | 
Since the entries of P are positive, the Markov chain is regular and hence has a unique steady-state vector 
q. To find 4 we will solve the system (7 — Pq = 0, which we can write as 


ao ot l|a2]={o| 


gj =05s, gz=s 


The general solution of this system is 


(verify), which we can write in vector form as 


| (13) 


For 4 to be a probability vector, we must have 
3 
1=¢,4 7-3" 


which implies that s = Z. Substituting this value in 13 yields the steady-state vector 


g= 


LID bole 


which is consistent with the numerical results obtained in 9. 


EXAMPLE 9 Example 4 Revisited <4 


The transition matrix for the Markov chain in Example 4 is 
05 04 06 


P=/0.2 02 0.3 
0.3 04 01 


Since the entries of P are positive, the Markov chain is regular and hence has a unique steady-state vector 
q. To find 4 we will solve the system (/ — Pq = 0, which we can write (using fractions) as 


i 2 3 
: . : . : 
=— 3 “all lol (14) 
35 oo Sd. 

10 5 10 


(We have converted to fractions to avoid roundoff error in this illustrative example.) We leave it for you 


to confirm that the reduced row echelon form of the coefficient matrix is 


and that the general solution of 14 is 
Zi (15) 


32 


For 4 to be a probability vector we must have gj + ¢3 ++ ¢g3 = 1, from which it follows that s = 1 


(verify). Substituting this value in 15 yields the steady-state vector 


60 


1 0.5042 

q= ae ww | 0.2269 
0.2689 
119 


(verify), which is consistent with the results obtained in Example 4. 


Concept Review 

e Dynamical system 

e State of a variable 

e State of a dynamical system 
e Stochastic process 

e Probability 

Probability vector 


Stochastic matrix 


Markov chain 


Transition matrix 


Regular stochastic matrix 


Regular Markov chain 


Steady-state vector 


Skills 

e Determine whether a matrix is stochastic. 

e Compute the state vectors from a transition matrix and an initial state. 
¢ Determine whether a stochastic matrix is regular. 

e Determine whether a Markov chain is regular. 


e Find the steady-state vector for a regular transition matrix. 


Exercise Set 4.12 


In Exercises 1—2, determine whether A is a stochastic matrix. If A is not stochastic, then explain why not. 


1 
*(a) ,_|94 0.3 
a=(96 | 


(b) ,_ [04 0.6 
a=|95 | 


(c) ,; i1 
2 3 
A=|0 0 1 
3 
ce 
05 3 
(d) oe es | 
3 3 2 
oy ie ae 
A=|¢ 3 2 
fd 
23 ~«! 
Answer: 


(a) Stochastic 
(b) Not stochastic 
(c) Stochastic 
(d) Not stochastic 


(b) 4_|¥: 
4=(05 01 
(c) os ee a 
12 9 6 
=e (pe 2 
A=|5 0 2 
> 8 
ao 
Ge lag ay a 
3 2 
= Doel 
AS |e 333 
1 
3. 20 


In Exercises 3—4, use Formulas 11 and 12 to compute the state vector X4 in two different ways. 
3. p_ 0.5 0.6 oe 0.5 
~ OS-0.4) °°. 10:5 
Answer: 
0.54545 
0.45455 
Y eee (eG ae _ fl 
P=[9) i ~=| | 


In Exercises 5—6, determine whether P is a regular stochastic matrix. 


Sey “fd 
Ds Sf 
P= 
4% 
5 7 
) [19 
5 
P= 
a: i 
5 
© [14 
P= 
4 9 
5 
Answer: 
(a) Regular 
(b) Not regular 
(c) Regular 
6. (a) a4 
2 
P= 
1 9 
2 
6) |, 2 
3 
P= 
9 1 
3 
(c) 3. i 
4 3 
TD 
4 3 


In Exercises 7-10, verify that P is a regular stochastic matrix, and find the steady-state vector for the associated 
Markov chain. 


7: 12 
4 3 
ale ae 
4 3 
Answer 
8 
1? 
a 
lV? 


8 »_ [0.2 0.6 
P= A 


As 
22 9 
ms (> ee be 
=a 5.3 
A. 2 
a 3 
Answer 
4 
11 
4. 
11 
2 
11 
10. ae ees 
3 4 5 
3 2 
P=|0 47% 
2 1 
2 


11. Consider a Markov process with transition matrix 
State 1 State 2 


Statel [0.2 0.1 
State2 [0.8 0.9 


(a) What does the entry 0.2 represent? 
(b) What does the entry 0.1 represent? 
(c) Ifthe system is in state | initially, what is the probability that it will be in state 2 at the next observation? 


(d) If the system has a 50% chance of being in state | initially, what is the probability that it will be in state 2 at the 
next observation? 


Answer: 


(a) Probability that something in state 1 stays in state 1 
(b) Probability that something in state 2 moves to state 1 
(c) 0.8 

(d) 0.85 


12. Consider a Markov process with transition matrix 
State 1 State 2 


State 1 
State 2 1 


(nA ~JJo 


(a) What does the entry g represent? 


(b) What does the entry 0 represent? 


13. 


14. 


15. 


(c) Ifthe system is in state | initially, what is the probability that it will be in state 1 at the next observation? 
(d) If the system has a 50% chance of being in state 1 initially, what is the probability that it will be in state 2 at the 


next observation? 


On a given day the air quality in a certain city is either good or bad. Records show that when the air quality is good 
on one day, then there is a 95% chance that it will be good the next day, and when the air quality is bad on one day, 
then there is a 45% chance that it will be bad the next day. 


(a) Find a transition matrix for this phenomenon. 
(b) If the air quality is good today, what is the probability that it will be good two days from now? 
(c) Ifthe air quality is bad today, what is the probability that it will be bad three days from now? 


(d) If there is a 20% chance that the air quality will be good today, what is the probability that it will be good 
tomorrow? 


Answer: 


(a) [0.95 0.55 
0.05 0.45 


(b) 0.93 

(c) 0.142 

(d) 0.63 

In a laboratory experiment, a mouse can choose one of two food types each day, type I or type II. Records show that 


if the mouse chooses type I on a given day, then there is a 75% chance that it will choose type I the next day, and if 
it chooses type I on one day, then there is a 50% chance that it will choose type II the next day. 


(a) Find a transition matrix for this phenomenon. 

(b) If the mouse chooses type I today, what is the probability that it will choose type I two days from now? 

(c) Ifthe mouse chooses type II today, what is the probability that it will choose type II three days from now? 

(d) If there is a 10% chance that the mouse will choose type I today, what is the probability that it will choose type 


I tomorrow? 


Suppose that at some initial point in time 100,000 people live in a certain city and 25,000 people live in its suburbs. 
The Regional Planning Commission determines that each year 5% of the city population moves to the suburbs and 
3% of the suburban population moves to the city. 


(a) Assuming that the total population remains constant, make a table that shows the populations of the city and its 
suburbs over a five-year period (round to the nearest integer). 


(b) Over the long term, how will the population be distributed between the city and its suburbs? 


Answer: 

(a) 
Year 1 2 3 4 5 
City 95,750] 91,840] 88,243] 84,933] 81,889 
Suburbs} 29,250} 33,160] 36,757] 40,067] 43,111 

(b) 


City 46,875 


Suburbs| 78,125 | 


16. Suppose that two competing television stations, station 1 and station 2, each have 50% of the viewer market at some 


17. 


1 


loo) 


initial point in time. Assume that over each one-year period station 1 captures 5% of station 2's market share and 
station 2 captures 10% of station 1's market share. 


(a) Make a table that shows the market share of each station over a five-year period. 
(b) Over the long term, how will the market share be distributed between the two stations? 
Suppose that a car rental agency has three locations, numbered 1, 2, and 3. A customer may rent a car from any of 


the three locations and return it to any of the three locations. Records show that cars are rented and returned in 
accordance with the following probabilities: 


Rented from Location 


Returned to Location 2 


(a) Assuming that a car is rented from location 1, what is the probability that it will be at location 1 after two 
rentals? 


(b) Assuming that this dynamical system can be modeled as a Markov chain, find the steady-state vector. 


(c) Ifthe rental agency owns 120 cars, how many parking spaces should it allocate at each location to be 
reasonably certain that it will have enough spaces for the cars over the long term? Explain your reasoning. 


Answer: 


a) 23. 
© Too 


(b) | _46_ 
159 
22 
53 
ae 
159 


(c) 35, 50, 35 


. Physical traits are determined by the genes that an offspring receives from its parents. In the simplest case a trait in 


the offspring is determined by one pair of genes, one member of the pair inherited from the male parent and the 
other from the female parent. Typically, each gene in a pair can assume one of two forms, called alleles, denoted by 
A and a. This leads to three possible pairings: 

AA, Aa, aa 


called genotypes (the pairs Aa and aA determine the same trait and hence are not distinguished from one another). It 
is shown in the study of heredity that if a parent of known genotype is crossed with a random parent of unknown 
genotype, then the offspring will have the genotype probabilities given in the following table, which can be viewed 
as a transition matrix for a Markov process: 


19. 


2 
2 


2 


23. 


0. 
1. 


2. 


Ge 


Genotype of Parent 
AA Aa aa 


AA 


Genotype of Offspring Aa 


Thus, for example, the offspring of a parent of genotype AA that is crossed at random with a parent of unknown 
genotype will have a 50% chance of being AA, a 50% chance of being Aa, and no chance of being aa. 


(a) Show that the transition matrix is regular. 


(b) Find the steady-state vector, and discuss its physical interpretation. 


Fill in the missing entries of the stochastic matrix 


# Ul 


— 
ole 


and find its steady-state vector. 


Answer: 
Ri cepa’ eg 1 
10 10 «5 3 
i [bot Sie ome ny ere 
BES. Sigs Pa 
2 er a 2 A: 
10 5) «10 3 
If P is an » x » stochastic matrix, and if M is a | x » matrix whose entries are all 1's, then Af P= 


If P is a regular stochastic matrix with steady-state vector 4, what can you say about the sequence of products 
Pq, Pg, Pq, ee Pkg, rat 

as ‘a > a0? 

Answer: 

pe q = q for every positive integer k 


(a) If Pisa regular » x » stochastic matrix with steady-state vector 4, and if e;, e3, ..., @,, are the standard unit 
vectors in column form, what can you say about the behavior of the sequence 
Pe;, P?e,;, Pe,;, tec P*e,, _ 
as it —+ oo for eachi = 1, 2, ..., »? 


(b) What does this tell you about the behavior of the column vectors of P* as t —. 99? 


Prove that the product of two stochastic matrices is a stochastic matrix. [Hint: Write each column of the product as 


a linear combination of the columns of the first factor. 


24. Prove that if P is a stochastic matrix whose entries are all greater than or equal to p, then the entries of p? are 
greater than or equal to p. 


True-False Exercises 
In parts (a)—(e) determine whether the statement is true or false, and justify your answer. 
(a) 


The vector is a probability vector. 


WIP Oo Wile 


Answer: 
True 


(b) The matrix ke | is a regular stochastic matrix. 


Answer: 


True 


(c) The column vectors of a transition matrix are probability vectors. 
Answer: 


True 


(d) A steady-state vector for a Markov chain with transition matrix P is any solution of the linear system (/ — P)q = 0. 
Answer: 


False 


(e) The square of every regular stochastic matrix is stochastic. 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


Chapter 4 Supplementary Exercises 


1. Let V be the set of all ordered pairs of real numbers, and consider the following addition and scalar 
multiplication operations on u = (#1, #3, ¥3) and v= (v1, v2, v3): 
u+ v= (#1 +V1,42 +73, u3+73), Au= (Any, 0, 0) 

(a) Compute y + y and gy foru= (3, — 2,4), v= (1,5, —2), andg= ~ 1. 

(b) In words, explain why V is closed under addition and scalar multiplication. 

(c) Since the addition operation on V is the standard addition operation on 2°, certain vector space axioms 
hold for V because they are known to hold for 23. Which axioms in Definition 1 of Section 4.1 are 
they? 

(d) Show that Axioms 7, 8, and 9 hold. 


(e) Show that Axiom 10 fails for the given operations. 


Answer: 


(a) utv= @, 3,2), -u=(—3, 0, 0) 
(c) Axioms 1—5 


2. In each part, the solution space of the system is a subspace of 22 and so must be a line through the origin, 
a plane through the origin, all of 23, or the origin only. For each system, determine which is the case. If 
the subspace is a plane, find an equation for it, and if it is a line, find parametric equations. 

(a) Ox + Oy + 0z=0 
(bl) 2x—3y+ z=0 
6x =— Sy + 3z=0 
— 4x + 6y =—2z=0 
(c) x=—2y+7z=0 
—4x+ 8y +5z=0 
2x —4y + 3z=0 
(d) x+4y+8&=0 
2x + Sy + 6z=0 
3x-+ y=—4z=0 


3. For what values of s is the solution space of 
xy + x2+5x3=0 
X1#sxg+ x3=0 
Sx, + x2+ x3=0 
the origin only, a line through the origin, a plane through the origin, or all of 237 


Answer: 


Ifs#1, —2, the solution space is the origin. If s = 1, the solution space is a plane through the origin. If 
gs = — 2, the solution space is a line through the origin. 
4. (a) Express (4a, a — 8, a + 24) as a linear combination of (4, 1, 1) and (0, — 1, 2). 


(b) Express (3a + & + 3c, —a +4 —c, 2a + 5 + 2c) as a linear combination of (3, — 1, 2) and 
(1, 4, 1). 


(c) Express (2a = 4 + 4e, 3a =¢, 4 +c) as a linear combination of three nonzero vectors. 


5. Let W be the space spanned by f = sin x and g= Cos x. 
(a) Show that for any value of 9, £; = sin(x 4+- @) and gj = cos({x 4+ @) are vectors in W. 
(b) Show that f; and 1 form a basis for W. 


6. (a) Express v = (1, 1) as a linear combination of ¥; = (1, — 1), v2 = (3, 0), and v3 = (2, 1) in two 
different ways. 


(b) Explain why this does not violate Theorem 4.4.1. 


7. Let A be an » x matrix, and let vj, v2, ..., V, be linearly independent vectors in R” expressed as » x 1 
matrices. What must be true about 4 for Av, Av3, ..., Av), to be linearly independent? 


Answer: 


A must be invertible 
8. Must a basis for P,, contain a polynomial of degree k for each k = 0, 1, 2, ..., 2? Justify your answer. 


9. For the purpose of this exercise, let us define a “checkerboard matrix” to be a square matrix A = [a3;] 
such that 


1 fi+ jis even 
"HQ if: + jis odd 
Find the rank and nullity of the following checkerboard matrices. 
(a) The 3 x 3 checkerboard matrix. 
(b) The 4 x 4 checkerboard matrix. 


(c) The » x »% checkerboard matrix. 


Answer: 


(a) Rank = 2, nullity = 1 
(b) Rank = 2, nullity = 2 
(c) Rank = 2, nullity =» —2 


10. For the purpose of this exercise, let us define an “X-matrix” to be a square matrix with an odd number of 
rows and columns that has 0's everywhere except on the two diagonals where it has I's. Find the rank and 
nullity of the following X-matrices. 

(a) }1 90 1 
0 1 0 
y a | 


ll 


12. 


13. 


14. 


—- COO OC KH 
or OK © 
or oO & 
- Or & 
oo co 


001 
(c) the X-matrix of size (2% + 1} « (2% + 1) 


. In each part, show that the stated set of polynomials is a subspace of ?,, and find a basis for it. 


(a) All polynomials in P,, such that p( — x) = p(x). 
(b) All polynomials in ?,, such that p(0} = 0. 


Answer: 


(a) {1, ee ee “nes xm where 27, — » If is even and 33) = » — | ifn is odd. 
Oa ee ee et 
(Calculus required) Show that the set of all polynomials in ?,, that have a horizontal tangent at x — (Q) is a 


subspace of P,,. Find a basis for this subspace. 


(a) Find a basis for the vector space of all 3 x 3 symmetric matrices. 


(b) Find a basis for the vector space of all 3 x 3 skew-symmetric matrices. 


Answer: 

(a) {}1 0 0 010 00 1 00 0 00 0 00 0 
00 0 10 0],/0 0 0 010 001],/0 0 0 
00 0 00 0 10 0 00 0 010 00 1 

(b) 010 00 1 0 a0 
—1 0 0 00 0 0 a1 
00 0 —1 0 0 0 =—1 0 


Various advanced texts in linear algebra prove the following determinant criterion for rank: The rank of a 
matrix A is r if and only if A has some p x% x submatrix with a nonzero determinant, and all square 
submatrices of larger size have determinant zero. |Note: A submatrix of A is any matrix obtained by 
deleting rows or columns of A. The matrix A itself is also considered to be a submatrix of A.] In each part, 
use this criterion to find the rank of the matrix. 


(a) {12 0 
E t | 

(by [1 2 3 
Y 4 | 


(ce) | 4. 0°] 


(d) 1 =1 2 0 
3 17:0 -0 
-1 240 
15. Use the result in Exercise 14 above to find the possible ranks for matrices of the form 
0 0 0 0 DO aye 
0 0 0 0 ODO «a6 
0 0 0 0 DO «az 


0 0 0 0 0 ax 
45, @52 @53 @54 455 C56 


Answer: 


Possible ranks are 2, 1, and 0. 


16. Prove: If S is a basis for a vector space J’, then for any vectors y and y in V and any scalar k, the following 
relationships hold. 


(a) U+V) g=(@) y+ WW) 5 
(b) (ku) s=*(u) g 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


; CHAPTER 


Eigenvalues and 
Eigenvectors 


CHAPTER CONTENTS 


5.1. Eigenvalues and Eigenvectors 
5.2. Diagonalization 

5.3. Complex Vector Spaces 

5.4. Differential Equations 


INTRODUCTION 


In this chapter we will focus on classes of scalars and vectors known as “eigenvalues” and 
“eigenvectors,” terms derived from the German word eigen, meaning “own,” “peculiar 
to,” “characteristic,” or “individual.” The underlying idea first appeared in the study of 
rotational motion but was later used to classify various kinds of surfaces and to describe 
solutions of certain differential equations. In the early 1900s it was applied to matrices and 
matrix transformations, and today it has applications in such diverse fields as computer 
graphics, mechanical vibrations, heat flow, population dynamics, quantum mechanics, and 
economics to name just a few. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


5.1 Eigenvalues and Eigenvectors 


In this section we will define the notions of “eigenvalue” and “eigenvector” and discuss some of their basic 
properties. 


Definition of Eigenvalue and Eigenvector 


We begin with the main definition in this section. 


DEFINITION 1 


If A is an » x »% matrix, then a nonzero vector x in 8” is called an eigenvector of A (or of the matrix 
operator 7” 4) if Ax is a scalar multiple of x; that is, 

Ax = Ax 
for some scalar \. The scalar 4 is called an eigenvalue of A (or of T 4), and x is said to be an 
eigenvector corresponding to 3. 


The requirement that an eigenvector be 
nonzero is imposed to avoid the unimportant 
case 4() = \Q, which holds for every 4 and \ 


In general, the image of a vector x under multiplication by a square matrix A differs from x in both magnitude 
and direction. However, in the special case where x is an eigenvector of A, multiplication by A leaves the 
direction unchanged. For example, in 22 or 2? multiplication by A maps each eigenvector x of A (if any) 
along the same line through the origin as x. Depending on the sign and magnitude of the eigenvalue \, 
corresponding to x, the operation 4x — \x compresses or stretches x by a factor of \, with a reversal of 
direction in the case where , is negative (Figure 5.1.1). 


AX Pi 0 
Oe 0 


AX AX 


(a) OSFAZ] (5) As] (c) -lSAs0 (d) As-l 


Figure 5.1.1 


EXAMPLE 1 Eigenvector of a2 x 2 Matrix <@ 


The vector x = H is an eigenvector of 


corresponding to the eigenvalue \ — 3, since 


IE» 


Geometrically, multiplication by 4 has stretched the vector x by a factor of 3 (Figure 5.1.2). 


Figure 5.1.2 


Computing Eigenvalues and Eigenvectors 


Our next objective is to obtain a general procedure for finding eigenvalues and eigenvectors of an » x » 
matrix A. We will begin with the problem of finding the eigenvalues of A. Note first that the equation 


Ax = Ax can be rewritten as 4x — \/x, or equivalently, as 

(Af = A)x = 0 
For 4, to be an eigenvalue of A this equation must have a nonzero solution for x. But it follows from parts (b) 
and (g) of Theorem 4.10.4 that this is so if and only if the coefficient matrix \7 — 4 has a zero determinant. 
Thus, we have the following result. 


THEOREM 5.1.1 


If A is an » x » matrix, then \ is an eigenvalue of A if and only if it satisfies the equation 
det(A J = A) =0 (1) 


This is called the characteristic equation of A. 


EXAMPLE 2 Finding Eigenvalues 


In Example | we observed that  — 3 is an eigenvalue of the matrix 


3.0 (0 
A= 
but we did not explain how we found it. Use the characteristic equation to find all eigenvalues 


of this matrix. 


Solution It follows from Formula | that the eigenvalues of A are the solutions of the equation 
det( AJ — A} = 0, which we can write as 


A=3 O |} _ 
=—8 A+1 
from which we obtain 
(A= 3) (A+ 1) =0 (2) 
This shows that the eigenvalues of A are } — 3 and \ — — |. Thus, in addition to the 
eigenvalue \, — 3 noted in Example |, we have discovered a second eigenvalue } — — ]. 


When the determinant det(A/ — A) that appears on the left side of 1 is expanded, the result is a polynomial 
p(A) of degree n that is called the characteristic polynomial of A. For example, it follows from 2 that the 
characteristic polynomial of the 3 sx 2 matrix A in Example 2 is 


pd) = (A=—3)(A4+ 1) =A? — 2\-3 
which is a polynomial of degree 2. In general, the characteristic polynomial of an » s¢ », matrix has the form 
PIA) =A ey 4 be, 


in which the coefficient of \” is 1 (Exercise 17). Since a polynomial of degree n has at most n distinct roots, it 
follows that the equation 


Meat +t, =0 (3) 
has at most n distinct solutions and consequently that an »z x » matrix has at most n distinct eigenvalues. Since 
some of these solutions may be complex numbers, it is possible for a matrix to have complex eigenvalues, 


even if that matrix itself has real entries. We will discuss this issue in more detail later, but for now we will 
focus on examples in which the eigenvalues are real numbers. 


EXAMPLE 3 Eigenvalues of a3 x 3 Matrix << 


Find the eigenvalues of 


Solution The characteristic polynomial of A is 


i <1 0 
det(M—A)=det] O A =—1 | =A?—8A7417A-4 
—4 17 A=8 


The eigenvalues of A must therefore satisfy the cubic equation 


3 —8r7 + 174-4 =0 (4) 


To solve this equation, we will begin by searching for integer solutions. This task can be 
simplified by exploiting the fact that all integer solutions (if there are any) of a polynomial 
equation with integer coefficients 


Nea"! + ...+o,=0 


In applications involving large matrices 
it is often not feasible to compute the 
characteristic equation directly so other 
methods must be used to find 
eigenvalues. We will consider such 
methods in Chapter 9. 


must be divisors of the constant term, ¢,. Thus, the only possible integer solutions of 4 are the 
divisors of —4, that is, +- 1, -- 2, +4. Successively substituting these values in 4 shows that 
\. = 4 is an integer solution. As a consequence,  — 4 must be a factor of the left side of 4. 
Dividing \ — 4 into? — 92 4+- 17\ —4 shows that 4 can be rewritten as 


(A—4) (x? -4a 1)= 0 
Thus, the remaining solutions of 4 satisfy the quadratic equation 
which can be solved by the quadratic formula. Thus the eigenvalues of A are 
A=4, A=24+y3, and A=2—-y3 


EXAMPLE 4 Eigenvalues of an Upper Triangular Matrix 


Find the eigenvalues of the upper triangular matrix 
@11 @12 @13 14 
0 422 423 a24 
0 O 33 a34 
0 OO O «aay 


A= 


Solution Recalling that the determinant of a triangular matrix is the product of the entries on 
the main diagonal (Theorem 2.1.2), we obtain 


A—ai1 212 —a130 —a14 
if) A—2a =a —2 
det(X— A) = =det aie " 
0 0 A=—a33 =—a74 
0 0 O° Aewu 


= (A= 211) (A= a2) (A — 33) (A — aq) 
Thus, the characteristic equation is 
(A= 411) (A—a92) (A — 433) (A— agg) = 9 
and the eigenvalues are 


A=a11, A=ayz, A=az3, A=a4yy 


which are precisely the diagonal entries of A. 


The following general theorem should be evident from the computations in the preceding example. 


THEOREM 5.1.2 


If A is an » x » triangular matrix (upper triangular, lower triangular, or diagonal), then the eigenvalues 
of A are the entries on the main diagonal of A. 


EXAMPLE 5 Eigenvalues of a Lower Triangular Matrix 


By inspection, the eigenvalues of the lower triangular matrix 


5 9 (OO 
a 
A=/-1 $ 0 

1 

5-8 -4 


Had Theorem 5.1.2 been available earlier, we 
could have anticipated the result obtained in 
Example 2. 


THEOREM 5.1.3 


If A is an y x » matrix, the following statements are equivalent. 

(a) \is an eigenvalue of A. 

(b) The system of equations (A/ — .A)x = 0 has nontrivial solutions. 
(c) There is a nonzero vector x such that 4x = Ax 

(d) is asolution of the characteristic equation det(A/ — A) = 0 


Finding Eigenvectors and Bases for Eigenspaces 


Now that we know how to find the eigenvalues of a matrix, we will consider the problem of finding the 
corresponding eigenvectors. Since the eigenvectors corresponding to an eigenvalue }, of a matrix A are the 
nonzero vectors that satisfy the equation 

(Af = A)x =0 
these eigenvectors are the nonzero vectors in the null space of the matrix \7 — 4. We call this null space the 
eigenspace of A corresponding to 4. Stated another way, the eigenspace of A corresponding to the eigenvalue 
Ais the solution space of the homogeneous system (AI = A)x = 0. 


Notice that x — Q is in every eigenspace even 
though it is not an eigenvector. Thus, it is the 
nonzero vectors in an eigenspace that are the 

eigenvectors. 


EXAMPLE 6 Bases forEigenspaces 


Find bases for the eigenspaces of the matrix 
3.0 OO 
A= 
Solution In Example 1 we found the characteristic equation of A to be 


(A= 3)(A+ 1) =0 


from which we obtained the eigenvalues , — 3 and } — — J. Thus, there are two eigenspaces 
of A, one corresponding to each of these eigenvalues. 


By definition, 


[a] 


is an eigenvector of A corresponding to an eigenvalue 4, if and only if x is a nontrivial solution 


of (AZ — A}x = 0, that is, of 
A=-3 0 x1] 10 
=—8 A+1//%2) |0 


If \ — 3, then this equation becomes 


[-2 «llal=[o 


whose general solution is 


x, = at x2z=¢ 
(verify) or in matrix form, 
By 1 
ie =| 2°| =4! 2 
t 1 
Thus, 
dl 
2 
1 


is a basis for the eigenspace corresponding to  — 3. We leave it as an exercise for you to 
follow the pattern of these computations and show that 


yl 


is a basis for the eigenspace corresponding to , — — ]. 


Historical Note Methods of linear algebra are used in the emerging field of computerized face 
recognition. Researchers are working with the idea that every human face in a racial group is a 
combination of a few dozen primary shapes. For example, by analyzing three-dimensional scans of 
many faces, researchers at Rockefeller University have produced both an average head shape in the 


Caucasian group—dubbed the meanhead (top row left in the figure to the left)—and a set of 
standardized variations from that shape, called eigenheads (15 of which are shown in the picture). 
These are so named because they are eigenvectors of a certain matrix that stores digitized facial 
information. Face shapes are represented mathematically as linear combinations of the eigenheads. 
[Image: Courtesy Dr. Joseph Atick, Dr. Norman Redlich, and Dr. Paul Griffith] 


EXAMPLE 7 Eigenvectors and Bases for Eigenspaces 


Find bases for the eigenspaces of 


Solution The characteristic equation of A is ,? — 5\2 4. 8\ — 4 = 0, or in factored form, 
(A= 1) (A= 2)? = 0 (verify). Thus, the distinct eigenvalues of A are \ — | and \ — 2, so there 


are two eigenspaces of A. 


By definition, 


x3 
is an eigenvector of A corresponding to 4 if and only if x is a nontrivial solution of 
(AZ — A)x =0, or in matrix form, 
A 0 2 x1 0 
=—1 A=2 =1 |/4%2}/=]0 (5) 
=—1 0 A=—3}/*3 0 


In the case where = 2, Formula 5 becomes 
20 2|/%1 0 
=—1 0 =1]/4%2}=]0 
=—1 0 =1}]/%3 0 
Solving this system using Gaussian elimination yields (verify) 
xXj,= —S, X2=8, x3=8 


Thus, the eigenvectors of A corresponding to , — 2 are the nonzero vectors of the form 


—Ss —s 0 —1! 0 
x= é}/=|] O]+]é]=s| O] +2) 1 
s s 0 1 0 


Since 


=1 0 
0] and} 1 
1 0 


are linearly independent (why?), these vectors form a basis for the eigenspace corresponding to 
A = 2: 


If \ — 1, then 5 becomes 


1 O 2)/*1 0 
=—1 =1 =1}/4%2}/=/0 
=—1 O =2]|%*3 0 
Solving this system yields (verify) 
xj, = —2s, X2=5, X3=8 


Thus, the eigenvectors corresponding to , — ] are the nonzero vectors of the form 


—25 —2 —2 
s|=s| 11] so that 1 
s 1 1 


is a basis for the eigenspace corresponding to \ = 1. 


Powers of a Matrix 


Once the eigenvalues and eigenvectors of a matrix A are found, it is a simple matter to find the eigenvalues 
and eigenvectors of any positive integer power of A; for example, if \ is an eigenvalue of A and x is a 
corresponding eigenvector, then 


A*x = A( Ax) = A(Ax) = AC Ax) = AQAx) = A*x 
which shows that ) is an eigenvalue of 4“ and that x is a corresponding eigenvector. In general, we have the 
following result. 


THEOREM 5.1.4 


If k is a positive integer, \ is an eigenvalue of a matrix A, and x is a corresponding eigenvector, then 
\* is an eigenvalue of ,4* and x is a corresponding eigenvector. 


EXAMPLE 8 Powers of a Matrix << 


In Example 7 we showed that the eigenvalues of 


00 =—2 
A=|1 2 1 
10 3 


are \ — 2 and \ — J, so from Theorem 5.1.4 both , — 97 — 129 and \ — 1? — ] are eigenvalues of 
A’. We also showed that 


=-1 0 
O| and | 1 
il 0 


are eigenvectors of A corresponding to the eigenvalue \, — 2, so from Theorem 5.1.4 they are also 
eigenvectors of 4? corresponding to } — 2? — 129. Similarly, the eigenvector 


=—2 


of A corresponding to the eigenvalue }, — ] is also an eigenvector of ,4? corresponding to 
7 
A=1*= 1 


Eigenvalues and Invertibility 


The next theorem establishes a relationship between eigenvalues and the invertibility of a matrix. 


THEOREM 5.1.5 


A square matrix A is invertible if and only if , — Q is not an eigenvalue of A. 


Proof Assume that A is an » 5 % matrix and observe first that \ — 0 is a solution of the characteristic 
equation 


M4 cyA7 1 + +c, =0 
if and only if the constant term ¢» is zero. Thus, it suffices to prove that A is invertible if and only if c,, # 0. 
But 


det(M — AD =A 4 eyA7 1 4 +c, 
or, on setting 4 — 0, 
det(—A) =c, or (—1)” det (A) =c,, 


It follows from the last equation that det(_4) = 0 if and only if ¢,, = 0, and this in turn implies that A is 
invertible if and only if c,, # 0. 


EXAMPLE 9 Eigenvalues and Invertibility << 


The matrix A in Example 7 is invertible since it has eigenvalues  — | and \ — 2, neither of which 
is zero. We leave it for you to check this conclusion by showing that det(A) # 0. 


More on the Equivalence Theorem 


As our final result in this section, we will use Theorem 5.1.5 to add one additional part to Theorem 4.10.4. 


THEOREM 5.1.6 Equivalent Statements 


If A is an » x » matrix, then the following statements are equivalent. 
(a) A is invertible. 

(b) Ax —Q has only the trivial solution. 

(c) The reduced row echelon form of A is /,,. 

(d) A is expressible as a product of elementary matrices. 

(e) Ax —h is consistent for every » x | matrix b. 

() Ax —=hb has exactly one solution for every » x | matrix b. 
(g) detCA) #0. 

(h) The column vectors of A are linearly independent. 

(i) The row vectors of A are linearly independent. 

(i) The column vectors of A span 2”. 

(k) The row vectors of A span 2”. 

(l) The column vectors of A form a basis for 2”. 

(m) The row vectors of A form a basis for 2”. 

(n) A has rank n. 

(o) A has nullity 0. 

(p) The orthogonal complement of the null space of 4 is 2”. 
(q) The orthogonal complement of the row space of A is {0}. 
(r) The range of T gis R”. 


(s) TF _4is one-to-one. 


(t) \—OQ isnot an eigenvalue of A. 


This theorem relates all of the major topics we have studied thus far. 


Concept Review 

e Eigenvector 

e Eigenvalue 

° Characteristic equation 

e Characteristic polynomial 

e Eigenspace 

e Equivalence Theorem 

Skills 

e Find the eigenvalues of a matrix. 


e Find bases for the eigenspaces of a matrix. 


Exercise Set 5.1 


In Exercises 1—2, confirm by multiplication that x is an eigenvector of A, and find the corresponding 
eigenvalue. 


Answer 
5 
2 2—1 —1 1 
A=| =1 2 —-1|]; x=/1 
—1| —1 2 1 


3. Find the characteristic equations of the following matrices: 


(a) |3 0 
8 =1 


(b) [10 -9 
4 =p 


Answer: 


(a) }7—-24—3=0 
(b) \7—~8A+ 16=0 


(c) 47 12=0 
(d) 1743=0 
(e) A7=0 


(f) \7—-24+1=0 


. Find the eigenvalues of the matrices in Exercise 3 


5. Find bases for the eigenspaces of the matrices in Exercise 3 


Answer: 
(a) af 
Basis for eigenspace corresponding to A= 3:]| 2 |; basis for eigenspace corresponding to 
1 
0 
H 
(b) 3 
Basis for eigenspace corresponding to A= 4:] 2 
1 
(c) a3. 
Basis for eigenspace corresponding to A= y12 : y12 ; basis for eigenspace corresponding to 
1 
ae See 
A= — y12 y12 
1 


(d) There are no eigenspaces. 


(¢) Basis for eigenspace corresponding to A= 0: Hi 


1 


oO e- © 
as | | 


(f) Basis for eigenspace corresponding to A= 1: a 


6. Find the characteristic equations of the following matrices: 
(a) 401 
—2 1 0 
—2 0 1 


(b) 


(d) 


ae re 
O —1 =—8 
1 O =—2 


(f) 


| 
is 

2 9 

| 

| 

| 


7. Find the eigenvalues of the matrices in Exercise 6. 


Answer: 


(a) 1,2,3 
(b) -¥2,0, 2 
(c) -8 
(d) 2 
(e) 2 
(f) —4,3 
8. Find bases for the eigenspaces of the matrices in Exercise 6. 


9. Find the characteristic equations of the following matrices: 


@ foo 20 
io dx 
4 230 
C20. “Ora 

(b) [10 -9 oO 0 
oe ae | 
0. Oo Sj 
Gc ow 4: 2 


Answer: 


Ee Geo eee) Cemetery ae 
(b) \4—~ 8A2 + 19A2 — 2444+ 48 =0 


10. Find the eigenvalues of the matrices in Exercise 9. 


11. Find bases for the eigenspaces of the matrices in Exercise 9. 


Answer: 
(a) 2] fo = 
Meriva : ; Meet Sade : 
0 1 0 
(b) EI 
2 
A=4:basis | 1 
0 
0 


; A= =—1:basis 


12. By inspection, find the eigenvalues of the following matrices: 


(b) [ 3 0 0 
“2-70 
481 

(c) | _1 
7 00:0 

1 
0-7 0 0 
0 010 
1 
0 005 


13. Find the eigenvalues of ,4? for 


oo ow 
SONI Ww 


Answer: 


9 
i We eee Nee See 
1, ( = sty, P= 512 


=—2 


1 
1 
0 


14. 


15. 


16. 


17. 


18. 


19. 


Find the eigenvalues and bases for the eigenspaces of 42° for 
=-1 =<2 =2 
A=| 1 2 1 
-1 =-1 O 


Let A be a 2 x 2 matrix, and call a line through the origin of 22 invariant under A if Ax lies on the line 


when x does. Find equations for all lines in 22, if any, that are invariant under the given matrix. 


(a) ae E a 


ae | 

(b) ,_ Oo 1 
en) 
(e)- 32.3 
=| | 


Answer: 


(a) y=x and y= 2x 

(b) No lines 

(c) y=0 

Find det(_A) given that A has p(.A) as its characteristic polynomial. 
(a) p(X) =A2 — D072 A+ 5 

(b+) pA) =A4 =F +7 

[Hint: See the proof of Theorem 5.1.5.] 

Let A be an » x »% matrix. 


(a) Prove that the characteristic polynomial of A has degree n. 


(b) Prove that the coefficient of \” in the characteristic polynomial is 1. 


Show that the characteristic equation of a 2 % 2 matrix A can be expressed as 37 = trCA)A + det(A) = 0, 
where tr(A) is the trace of A. 
ab 
A= 
aA 


then the solutions of the characteristic equation of A are 


A=3| +4) + Y(a—a)? + 400 


Use this result to show that A has 
(a) two distinct real eigenvalues if (a —@)* + 4be > 0. 


Use the result in Exercise 18 to show that if 


(b) two repeated real eigenvalues if (a — a - b4be = 0. 


(c) complex conjugate eigenvalues if (a — d@ \? + 4be <0. 


20. Let A be the matrix in Exercise 19. Show that if } + O, then 


_| é oe =) 
a a—A; ee a—Az 


are eigenvectors of A that correspond, respectively, to the eigenvalues 


M=3| +d) +4 (a@—a)? 4 Abe | 


and 
m= 3 a +d)—¥ (a—a)? 4 Abc | 
21. Use the result of Exercise 18 to prove that if » (A) is the characteristic polynomial of a 2 x 2 matrix A, 
then p(A) = 0. 
22. Prove: If a, b, c, and d are integers such that g + b =¢ +, then 


23. 


24. 


25. 


26. 


27. 


28. 


[ 


has integer eigenvalues—namely, Aj = @ + 4 andAz=a—c. 


Prove: If 4 is an eigenvalue of an invertible matrix A, and x is a corresponding eigenvector, then ] / is 
an eigenvalue of ,4~!, and x is a corresponding eigenvector. 


Prove: If 4, is an eigenvalue of A, x is a corresponding eigenvector, and s is a scalar, then \ — gis an 
eigenvalue of 4 — sj, and x is a corresponding eigenvector. 


Prove: If 4 is an eigenvalue of A and x is a corresponding eigenvector, then g\ is an eigenvalue of ¢4 for 
every scalar s, and x is a corresponding eigenvector. 


Find the eigenvalues and bases for the eigenspaces of 


ts 
II 
| 
nN 
Now DY 
AN wo 


and then use Exercises 23 and 24 to find the eigenvalues and bases for the eigenspaces of 
(a) A} 

(b) 4— 3/ 

(c) A+ 2! 


(a) Prove that if A is a square matrix, then A and 47 have the same eigenvalues. [Hint: Look at the 
characteristic equationdet(A/ — A} = 0.] 

(b) Show that 4 and 47 need not have the same eigenspaces. [Hint: Use the result in Exercise 20 to find 
a 2 % 2 matrix for which A and 47 have different eigenspaces. ] 


Suppose that the characteristic polynomial of some matrix A is found to be 

p(a) = (A= 1)(A— 3) 2 (A—4) 3. In each part, answer the question and explain your reasoning. 
(a) What is the size of A? 

(b) Is A invertible? 


(c) How many eigenspaces does A have? 


29. The eigenvectors that we have been studying are sometimes called right eigenvectors to distinguish them 
from left eigenvectors, which are » x | column matrices x that satisfy the equation x A= px? for some 


scalar js. What is the relationship, if any, between the right eigenvectors and corresponding eigenvalues 
of A and the left eigenvectors and corresponding eigenvalues ps of A? 


True-False Exercises 

In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) If A is a square matrix and 4x — \x for some nonzero scalar 4, then x is an eigenvector of A. 
Answer: 


False 


(b) If \ is an eigenvalue of a matrix A, then the linear system (A/ — A)}x = 0 has only the trivial solution. 
Answer: 


False 


(c) If the characteristic polynomial of a matrix A is p(A) = \? 4-1, then A is invertible. 


Answer: 


True 


(d) If \ is an eigenvalue of a matrix A, then the eigenspace of A corresponding to \ is the set of eigenvectors 
of A corresponding to }. 


Answer: 


False 


(e) If 0 is an eigenvalue of a matrix A, then 4? is singular. 


Answer: 


True 


(f) The eigenvalues of a matrix A are the same as the eigenvalues of the reduced row echelon form of A. 
Answer: 


False 


(g) If 0 is an eigenvalue of a matrix A, then the set of columns of A is linearly independent. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


9.2 Diagonalization 


In this section we will be concerned with the problem of finding a basis for R” that consists of eigenvectors of an 


» x # Matrix A. Such bases can be used to study geometric properties of A and to simplify various numerical 
computations. These bases are also of physical significance in a wide variety of applications, some of which will be 
considered later in this text. 


The Matrix Diagonalization Problem 


Our first objective in this section is to show that the following two seemingly different problems are equivalent. 


Problem 1 Given an » x » matrix A, does there exist an invertible matrix P such that P = AP is diagonal? 


Problem 2 Given an » x 3 matrix A, does A have n linearly independent eigenvectors? 


Similarity 


The matrix product P—! 4p that appears in Problem | is called a similarity transformation of the matrix A. Such 
products are important in the study of eigenvectors and eigenvalues, so we will begin with some terminology about 
them. 


DEFINITION 1 


If A and B are square matrices, then we say that B is similar to A if there is an invertible matrix P such that 
-l 
B=P AP. 


Note that if B is similar to A, then it is also true that A is similar to B, since we can express Bas B= Q ha by 
taking O = P — This being the case, we will usually say that A and B are similar matrices if either is similar to 


the other. 


Similarity Invariants 


Similar matrices have many properties in common. For example, if 3 — P—!,4P, then it follows that A and B have 


the same determinant, since 


det(B) = det(P~'AP} = det(P }det( A) det(P) 
Say ot) det(P) = det(A) 


In general, any property that is shared by all similar matrices is called a similarity invariant or is said to be 
invariant under similarity. Table 1 lists the most important similarity invariants. The proofs of some of these 
results are given as exercises. 


Property 
Determinant 
Invertibility 
Rank 
Nullity 
Trace 


Characteristic 
polynomial 


Eigenvalues 


Eigenspace 
dimension 


Table 1 Similarity Invariants 
Description 
Aand P—! 4? have the same determinant. 
A is invertible if and only if P 1 AP is invertible. 
Aand P—! 4? have the same rank. 
Aand P—! 4P have the same nullity. 
Aand P—! 4P have the same trace. 


Aand P—! 4P have the same characteristic polynomial. 


Aand P—! 4P have the same eigenvalues. 


If \ is an eigenvalue of A and hence of P 1 AP, then the eigenspace of A 
corresponding to \ and the eigenspace of P—!_4P corresponding to \ have the same 


dimension. 


Expressed in the language of similarity, Problem 1 posed above is equivalent to asking whether the matrix A is 
similar to a diagonal matrix. If so, the diagonal matrix will have all of the similarity-invariant properties of A, but 
will have a simpler form, making it easier to analyze and work with. This important idea has some associated 


terminology. 


DEFINITION 2 


A square matrix A is said to be diagonalizable if it is similar to some diagonal matrix; that is, if there exists 
an invertible matrix P such that P—! 4p is diagonal. In this case the matrix P is said to diagonalize A. 


The following theorem shows that Problems | and 2 posed above are actually two different forms of the same 


mathematical problem. 


THEOREM 5.2.1 


If A is an » x » matrix, the following statements are equivalent. 


(a) A is diagonalizable. 


(b) A has n linearly independent eigenvectors. 


Part (b) of Theorem 5.2.1 is equivalent to saying 
that there is a basis for R" consisting of 
eigenvectors of A. Why? 


Proof (a)=>(b) Since A is assumed to be diagonalizable, it follows that there exists an invertible matrix P and a 
diagonal matrix D such that P—! 4p — 7) or, equivalently, 


AP=PD (1) 


If we denote the column vectors of P by pj, p32, -.., Py, and if we assume that the diagonal entries of D are 
Ay, Ag, --- Ay, then by Formula 6 of Section 1.3 the left side of 1 can be expressed as 


AP=Alp, p2 --- Pn] =[4p1 Ap2 -.. Apy] 
and, as noted in the comment following Example | of Section 1.7, the right side of 1 can be expressed as 


PD=[Ajp1 Azp2 --- AnP»] 
Thus, it follows from | that 


Ap; =A1p1, Ap2=Azp2...... APy =AnPy (2) 


Since P is invertible, we know from Theorem 5.1.6 that its column vectors pj, p32, -.., Py are linearly independent 
(and hence nonzero). Thus, it follows from 2 that these n column vectors are eigenvectors of A. 


Proof (b)>(a) Assume that A has n linearly independent eigenvectors, pj, p3, -.., Py, and that Ay, Ag, -.., Ay are 
the corresponding eigenvalues. If we let 


P=[p1 P2 --- Pw] 
and if we let D be the diagonal matrix that has Aj, A, ..., Ay, as its successive diagonal entries, then 


AP = Alp, p2 ... Ph) =[Ap) Apz ... Apy] 
= [A1p1 A2p2 --. AyPn] =PD 


Since the column vectors of P are linearly independent, it follows from Theorem 5.1.6 that P is invertible, so that 
this last equation can be rewritten as P—!_4 — 79, which shows that A is diagonalizable. 


Procedure for Diagonalizing a Matrix 


The preceding theorem guarantees that an », s », matrix A with n linearly independent eigenvectors is 
diagonalizable, and the proof suggests the following method for diagonalizing A. 


Procedure for Diagonalizing a Matrix 


Step 1. Confirm that the matrix is actually diagonalizable by finding n linearly independent eigenvectors. 
One way to do this is by finding a basis for each eigenspace and merging these basis vectors into a single 
set S. If this set has fewer than n vectors, then the matrix is not diagonalizable. 


Step 2. Form the matrix P= [p, p2 -... Py] that has the vectors in S as its column vectors. 
Step 3. The matrix P—!_4P will be diagonal and have the eigenvalues Aj, Az, ..., 4 corresponding to the 


eigenvectors pj, P32, --., Py as its successive diagonal entries. 


EXAMPLE 1 Finding a Matrix P That Diagonalizes a Matrix A 


Find a matrix P that diagonalizes 


Solution In Example 7 of the preceding section we found the characteristic equation of A to be 
(A—1)(A—2)7=0 


and we found the following bases for the eigenspaces: 


=| 0 =—2 
A=2: pp=| Of, p2=[1]; A=1: p3= 
1 0 


There are three basis vectors in total, so the matrix 


—1 0 =—2 
P=! 01 1 
10 1 


diagonalizes A. As a check, you should verify that 
10 2]) 0 0 =—2])/=-1 0 =—2 2 

P‘aP=| 11 1]/ 12 11) 01 1{/=/0 

=—1 0 =1{/21 0 3 1 0 0 


oN & 
—- O&O 


In general, there is no preferred order for the columns of P. Since the ith diagonal entry of P—!_4P is an eigenvalue 
for the ith column vector of P, changing the order of the columns of P just changes the order of the eigenvalues on 
the diagonal of P—! 4. Thus, had we written 


in the preceding example, we would have obtained 


200 
P—aP=|0 1 0 
002 


EXAMPLE 2 A Matrix That Is Not Diagonalizable << 


Find a matrix P that diagonalizes 


Solution The characteristic polynomial of A is 


A=1) @ 0 
det(— AY=] -1 A—2 0 |=(A—1)(A-2)? 
$ «5 4-2 


so the characteristic equation is 
(A—1)(A—2)7 =0 


Thus, the distinct eigenvalues of A are = | and \ = 2. We leave it for you to show that bases for 
the eigenspaces are 


1 

8 0 
A=1: pr=|_1]; A=2: p2=/0 

8 1 

1 


Since A is a 3 x% 3 matrix and there are only two basis vectors in total, A is not diagonalizable. 


Alternative Solution If you are concerned only in determining whether a matrix is 
diagonalizable and not with actually finding a diagonalizing matrix P, then it is not necessary to 
compute bases for the eigenspaces—it suffices to find the dimensions of the eigenspaces. For this 
example, the eigenspace corresponding to , = ] is the solution space of the system 
0 O Off *1 i) 
—1 -1 O}/42}=]0 
3 =5 =1 || %3 0 


Since the coefficient matrix has rank 2 (verify), the nullity of this matrix is 1 by Theorem 4.8.2, and 
hence the eigenspace corresponding to  — | is one-dimensional. 


The eigenspace corresponding to \, — 2 is the solution space of the system 
1 O O}/ %4 0 
=—1 O Of} *2}=)/0 
3 =5 0||*3 0 
This coefficient matrix also has rank 2 and nullity 1 (verify), so the eigenspace corresponding to 


4. = 2 is also one-dimensional. Since the eigenspaces produce a total of two basis vectors, and since 
three are needed, the matrix A is not diagonalizable. 


There is an assumption in Example 1 that the column vectors of P, which are made up of basis vectors from the 
various eigenspaces of A, are linearly independent. The following theorem, proved at the end of this section, shows 
that this is so. 


THEOREM 5.2.2 


If v1, V2, -... Vy, are eigenvectors of a matrix A corresponding to distinct eigenvalues, then 
{v 1, V2, .... Vx} isa linearly independent set. 


Remark Theorem 5.2.2 is a special case of a more general result: Suppose that Aj, Ag, ..., Aj, are distinct 
eigenvalues and that we choose a linearly independent set in each of the corresponding eigenspaces. If we then 
merge all these vectors into a single set, the result will still be a linearly independent set. For example, if we choose 
three linearly independent vectors from one eigenspace and two linearly independent vectors from another 
eigenspace, then the five vectors together form a linearly independent set. We omit the proof. 


As a consequence of Theorem 5.2.2, we obtain the following important result. 


THEOREM 5.2.3 


If an » x % matrix A has n distinct eigenvalues, then A is diagonalizable. 


Proof Ifv¥, ¥3, ..., Vy are eigenvectors corresponding to the distinct eigenvalues Aj, Az, ..., Ay, then by Theorem 
5.2.2, ¥1, V3, -.., Vy are linearly independent. Thus, A is diagonalizable by Theorem 5.2.1. 


EXAMPLE 3 Using Theorem 5.2.3 <@ 


We saw in Example 3 of the preceding section that 


0 1 0 
A=|0 0 1 
4 =—17 8 
has three distinct eigenvalues: 4 = 4, \= 2 4 3. and \=?— 3. Therefore, A is diagonalizable 
and 
4 0 0 


polgp=|0 2+43 0 
0 0 2=¥3 


for some invertible matrix P. If needed, the matrix P can be found using the method shown in 
Example | of this section. 


EXAMPLE 4 Diagonalizability of Triangular Matrices <4 


From Theorem 5.1.2, the eigenvalues of a triangular matrix are the entries on its main diagonal. 
Thus, a triangular matrix with distinct entries on the main diagonal is diagonalizable. For example, 


-124 0 
031 7 
A= 
005 8 
000 =-2 
is a diagonalizable matrix with eigenvalues Ay = — 1, Az = 3, A3 = 5, Ag= =—2. 


Computing Powers of a Matrix 


There are many applications in which it is necessary to compute high powers of a square matrix A. We will show 
next that if A happens to be diagonalizable, then the computations can be simplified by diagonalizing A. 


To start, suppose that A is a diagonalizable » x », matrix, that P diagonalizes A, and that 


4; 0 ... 0 
pigpa|* 2 Vlog 
0 0 ... Ay 
Squaring both sides of this equation yields 
x G 0 
(P\aP?| 0%... O]ip? 
00... 


We can rewrite the left side of this equation as 
ae ee -1 a 
(P AP | =P APP AP=P “AIAP=P AP 


from which we obtain the relationship P—!_42> — 7p. More generally, if k is a positive integer, then a similar 


computation will show that 


= 
Oo 
% 


which we can rewrite as 


(3) 


Formula 3 reveals that raising a diagonalizable 
matrix A to a positive integer power has the effect 
of raising its eigenvalues to that power. 


Note that computing the right side of this formula involves only three matrix multiplications and the powers of the 


diagonal entries of D. For matrices of large size and high powers of ,, this involves substantially fewer operations 
than computing 4* directly. 


EXAMPLE 5 Power ofa Matrix << 


Use 3 to find A's, where 


Solution We showed in Example 1 that the matrix A is diagonalized by 


—-1 0 =—2 
P=| 01 1 
10 1 


and that 


Thus, it follows from 3 that 


-10 -27/2% o oO 10 2 
AB=pp¥p+ = | 01 ilo 23 9 os a | 
10 ff 9 9 43][-19 - 


(4) 
—38190 QO —16382 


8191 8192 8191 
8191 0 16383 


Remark With the method in the preceding example, most of the work is in diagonalizing A. Once that work is 
done, it can be used to compute any power of A. Thus, to compute 4199 we need only change the exponents from 
13 to 1000 in 4. 


Eigenvalues of Powers of a Matrix 


Once the eigenvalues and eigenvectors of any square matrix A are found, it is a simple matter to find the 
eigenvalues and eigenvectors of any positive integer power of A. For example, if \, is an eigenvalue of A and x is a 
corresponding eigenvector, then 


A*x = A(Ax) = A(Ax) = \(Ax) = A(Ax) = A*x 
which shows not only that ), is an eigenvalue of 42 but that x is a corresponding eigenvector. In general, we have 


the following result. 


Note that diagonalizability is not a requirement in 
Theorem 5.2.4. 


THEOREM 5.2.4 


If \, is an eigenvalue of a square matrix A and x is a corresponding eigenvector, and if & is any positive 
integer, then )* is an eigenvalue of ,4* and x is a corresponding eigenvector. 


Some problems that use this theorem are given in the exercises. 


Geometric and Algebraic Multiplicity 


Theorem 5.2.3 does not completely settle the diagonalizability question since it only guarantees that a square 
matrix with n distinct eigenvalues is diagonalizable, but does not preclude the possibility that there may exist 
diagonalizable matrices with fewer than n distinct eigenvalues. The following example shows that this is indeed the 
case. 


EXAMPLE 6 The Converse of Theorem 5.2.3 Is False <@ 


Consider the matrices 


100 110 
f=/0 1 0] and J=/0 1 1 
00 1 00 1 


It follows from Theorem 5.1.2 that both of these matrices have only one distinct eigenvalue, namely 
A= 1, and hence only one eigenspace. We leave it as an exercise for you to solve the characteristic 


equations 

(i= )x=0 and (AJ =J)x=0 
with  — | and show that for J the eigenspace is three-dimensional (all of 2%) and for J it is 
one-dimensional, consisting of all scalar multiples of 


This shows that the converse of Theorem 5.2.3 is false, since we have produced two 3 s< 3 matrices 
with fewer than three distinct eigenvalues, one of which is diagonalizable and the other of which is 
not. 


A full excursion into the study of diagonalizability is left for more advanced courses, but we will touch on one 
theorem that is important to a fuller understanding of diagonalizability. It can be proved that if Ag is an eigenvalue 
of A, then the dimension of the eigenspace corresponding to Ag cannot exceed the number of times that A — Ag 
appears as a factor of the characteristic polynomial of A. For example, in Example 1 and Example 2 the 
characteristic polynomial is 


A=—D)Qa—3)" 


Thus, the eigenspace corresponding to \, — | is at most (hence exactly) one-dimensional, and the eigenspace 
corresponding to , = 2 is at most two-dimensional. In Example | the eigenspace corresponding to \ = 2 actually 
had dimension 2, resulting in diagonalizability, but in Example 2 the eigenspace corresponding to \ = 2 had only 
dimension 1, resulting in nondiagonalizability. 


There is some terminology that is related to these ideas. If Ag is an eigenvalue of an » x matrix A, then the 
dimension of the eigenspace corresponding to Ag is called the geometric multiplicity of 4g, and the number of 
times that A — Ag appears as a factor in the characteristic polynomial of A is called the algebraic multiplicity of Xp. 
The following theorem, which we state without proof, summarizes the preceding discussion. 


THEOREM 5.2.5 Geometric and Algebraic Multiplicity 


If A is a square matrix, then: 
(a) For every eigenvalue of A, the geometric multiplicity is less than or equal to the algebraic multiplicity. 


(b) A is diagonalizable if and only if the geometric multiplicity of every eigenvalue is equal to the 
algebraic multiplicity. 


OPTIONAL 


We will complete this section with an optional proof of Theorem 5.2.2. 


Proof of Theorem 5.2.2 Let ¥4, v3, ..., vj; be eigenvectors of A corresponding to distinct eigenvalues 
Ay, Ag, ---, Ay. We will assume that v1, v3, ..., Vj, are linearly dependent and obtain a contradiction. We can then 
conclude that v1, ¥3, ..., ¥j, are linearly independent. 


Since an eigenvector is nonzero by definition, {v ,} is linearly independent. Let 7 be the largest integer such that 
{¥1, ¥9, .... Vy} 1s linearly independent. Since we are assuming that {v1, v2, ..., Vj} is linearly dependent, r 
satisfies ] < » < &. Moreover, by the definition of r, {v1, V2, ..., Vy41} is linearly dependent. Thus, there are 
scalars ¢1, €2, ...,Cy4 4, not all zero, such that 


C1V] + C2V2 +... + C41 ¥~41 = 9 


Multiplying both sides of 5 by A and using the fact that 
Av; =Ajv1, Aw2=Agv2,... Avp41 =Ap41¥y41 


we obtain 
CyAL¥y + C2AgV2 +... Cr 4 Arg ¥~41 = 9 


If we now multiply both sides of 5 by Ay44 and subtract the resulting equation from 6 we obtain 
01 (Ay — Apgy)¥y + 02(Ag — Ap 4p) ¥2 +... Cy (Ay — Ap 41) vy = 9 
Since {¥1, ¥3,..., Vy} 1S a linearly independent set, this equation implies that 
01(Ay —Apgy) =C2(A2 — Ap gi) =... = Cp Ay — Ap 4) = 0 


and since Ay, Ag, ..., Ay4 4 are assumed to be distinct, it follows that 


Cj =cz=...=c,=0 


Substituting these values in 5 yields 
Cr+i¥r41 =0 
Since the eigenvector ¥y+1 is nonzero, it follows that 


Cr1 =0 


But equations 7 and 8 contradict the fact that cy, ¢3, ..., ¢»41 are not all zero so the proof is complete. 


Concept Review 

e Similarity transformation 
e Similarity invariant 

e Similar matrices 

e Diagonalizable matrix 

e Geometric multiplicity 


e Algebraic multiplicity 


Skills 

e Determine whether a square matrix A is diagonalizable. 
e Diagonalize a square matrix A. 

e Find powers of a matrix using similarity. 


e Find the geometric multiplicity and the algebraic multiplicity of an eigenvalue. 


(5) 


(6) 


(7) 


(8) 


Exercise Set 5.2 


In Exercises 1-4, show that A and B are not similar matrices. 


iia ft t] ~/t © 
a-G 2}=[3 2 


Answer: 


Possible reason: Determinants are different. 


Answer: 


5. Let A be a 6 x 6 matrix with characteristic equation te (A=—1)(A= 2j* = 0. What are the possible dimensions 
for eigenspaces of A? 
Answer: 


A=0O:1or2; A=1:1, A=2:1,2, or 3 
6. Let 


oW © 
mre 


(a) Find the eigenvalues of A. 
(b) For each eigenvalue 4, find the rank of the matrix \/ — 4. 


(c) Is A diagonalizable? Justify your conclusion. 
In Exercises 7-11, use the method of Exercise 6 to determine whether the matrix is diagonalizable. 
7. |! | 
1 2 
Answer: 


Not diagonalizable 


Answer: 


Not diagonalizable 


w.f-1 0 1 
ais 
oe ce 


.}2 -1 0 1 
0 21 <1 
G- “Gos 2 
0 O00 3 


Answer: 


Not diagonalizable 


In Exercises 12—15, find a matrix P that diagonalizes A, and compute P —l 4p. 


ma=| oe e 


=20 1? 
13. 1 0 
A= 
le —] 
Answer: 
big ‘ i 0 
e=.3 pa =| i 
1 1 a 
14. 10 0 
A=/0 1 1 
011 
15. 20 —2 
A=|0 3 0 
0 0 3 
Answer 


0 1 300 
P=| 0 1 o|, P74P=l0 3 0 
100 002 


In Exercises 16—21, find the geometric and algebraic multiplicity of each eigenvalue of the matrix A, and 
determine whether A is diagonalizable. If A is diagonalizable, then find a matrix P that diagonalizes A, and find 


Pop. 


Answer: 


Answer: 


re: | 
———— nom 
ooom | 
COMnT OoONmMS 
Ser ae 
NOCO Noo S 
| | 
_—— es | 
II II 
xq xq 
Ss - 
a a 


Answer: 


0 0 
0 0 
3 0 
0 3 


0 
—2 
0 
0 


22. Use the method of Example 5 to compute 412, where 


23. Use the method of Example 5 to compute 4!!, where 


-—1 7 =1 
A=| 0 1 0 


0 15 =2 
Answer: 
=—1 10237 -—2047 
0 1 0 
0 10245 —2048 
24. In each part, compute the stated power of 

1-2 8 
A=|0 -—1 0 
0 oO =1 


(a) 4 iooo (b) A7t000 (c) (42sul (d) 47230 


25. Find A” ifn is a positive integer and 


3-1 0 
Fa Ps eee a 
a 
Answer: 
pe | 
1 1 111" 0 off 3 6 
A" =Pp"Pt=|2 oO -1l|0 3" 0 ; 0 -4 
em | Game i | a, Ca 
3 
26. Let 
b 
4=|* 
oa 
Show that 


(a) A is diagonalizable if (a —d@ \2 + 4be > 0. 
(b) A is not diagonalizable if (a —d 7 t 4be < 0. 
[Hint: See Exercise 19 of Section 5.1.] 
27. In the case where the matrix A in Exercise 26 is diagonalizable, find a matrix P that diagonalizes A. [Hint: See 
Exercise 20 of Section 5.1.] 


Answer: 


=) =) 


On possibility is P = f =i; a—2y 


| where Ay and A3 are as in Exercise 20 of Section 5.1. 


28. Prove that similar matrices have the same rank. 


29. Prove that similar matrices have the same nullity. 


30. 
31. 


32. 
33. 


34. 


Prove that similar matrices have the same trace. 

Prove that if A is diagonalizable, then so is ,4* for every positive integer k. 

Prove that if A is a diagonalizable matrix, then the rank of A is the number of nonzero eigenvalues of A. 
Suppose that the characteristic polynomial of some matrix A is found to be p(A) = (A= 1) (A= 3) ae -4) a 
In each part, answer the question and explain your reasoning. 

(a) What can you say about the dimensions of the eigenspaces of A? 

(b) What can you say about the dimensions of the eigenspaces if you know that A is diagonalizable? 


(c) If {¥y, v3, ¥3} is a linearly independent set of eigenvectors of A all of which correspond to the same 
eigenvalue of A, what can you say about the eigenvalue? 


Answer: 


(a) A= 1:dimension = 1; A=3:dimension <2; A=4:dimension <3 

(b) Dimensions will be exactly 1, 2, and 3. 

(c) A=4 

This problem will lead you through a proof of the fact that the algebraic multiplicity of an eigenvalue of an 


» x » Matrix A is greater than or equal to the geometric multiplicity. For this purpose, assume that Ag is an 
eigenvalue with geometric multiplicity k. 


(a) Prove that there is a basis B= {uy, ug, -.., U,} for R” in which the first & vectors of B form a basis for the 
eigenspace corresponding to Ag. 


(b) Let P be the matrix having the vectors in B as columns. Prove that the product 4? can be expressed as 


AP=P Aplin 
0 fF 


[Hint: Compare the first k column vectors on both sides. | 
(c) Use the result in part (b) to prove that A is similar to 


C= Aol, 
0 FF 


and hence that A and C have the same characteristic polynomial. 


(d) By considering det(A/ — C), prove that the characteristic polynomial of C (and hence A) contains the 
factor (A — Ag) at least k times, thereby proving that the algebraic multiplicity of Ag is greater than or equal 
to the geometric multiplicity k. 


True-False Exercises 


In parts (a)-(h) determine whether the statement is true or false, and justify your answer. 


(a) Every square matrix is similar to itself. 


Answer: 


True 


(b) If A, B, and C are matrices for which 4 is similar to B and B is similar to C, then A is similar to C. 


Answer: 


True 


(c) If A and B are similar invertible matrices, then 4~! and p~! are similar. 


Answer: 


True 


(d) If A is diagonalizable, then there is a unique matrix P such that P—! 4P is diagonal. 


Answer: 


False 


(e) If A is diagonalizable and invertible, then ,4~! is diagonalizable. 
Answer: 


True 


(f) If A is diagonalizable, then 47 is diagonalizable. 
Answer: 


True 


(g) If there is a basis for 8” consisting of eigenvectors of an y x » matrix A, then A is diagonalizable. 
Answer: 


True 


(h) If every eigenvalue of a matrix A has algebraic multiplicity 1, then A is diagonalizable. 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


5.3 Complex Vector Spaces 


Because the characteristic equation of any square matrix can have complex solutions, the notions of complex eigenvalues and 
eigenvectors arise naturally, even within the context of matrices with real entries. In this section we will discuss this idea and 
apply our results to study symmetric matrices in more detail. A review of the essentials of complex numbers appears in the 
back of this text. 


Review of Complex Numbers 


Recall that if z= @ + 4; is a complex number, then: 


° Re(z) =a and Im(z) = » are called the real part of z and the imaginary part of z, respectively, 
. F = Ya? +b 2 is called the modulus (or absolute value) of z, 


° =a — i is called the complex conjugate of z, 


° Z=a?" | b7= bP 


° the angle @ in Figure 5.3.1 is called an argument of z, 
« Rez) =[2| cos 
« Im(z) = e| sn@ 


* Z={[2|(cos @ + isin @) is called the polar form of z. 


iq i=art bi 
Im(z)=b @—-—-—--—---= 


Figure 5.3.1 


Complex Eigenvalues 
In Formula 3 of Section 5.1 we observed that the characteristic equation of a general » x », matrix A has the form 
Mec A7 1 4. +0, =0 (1) 


in which the highest power of \, has a coefficient of 1. Up to now we have limited our discussion to matrices in which the 
solutions of | are real numbers. However, it is possible for the characteristic equation of a matrix A with real entries to have 
imaginary solutions; for example, the characteristic equation of the matrix 


(37 


A+2 1 
=5 A=2 
which has the imaginary solutions =; and } = —j. To deal with this case we will need to explore the notion of a complex 
vector space and some related ideas. 


l=. -1=0 


Vectors in C” 


A vector space in which scalars are allowed to be complex numbers is called a complex vector space. In this section we will 
be concerned only with the following complex generalization of the real vector space 2”. 


DEFINITION 1 


If n is a positive integer, then a complex n-tuple is a sequence of n complex numbers (v1, V3, -.., Vy). The set of all 
complex n-tuples is called complex n-space and is denoted by ¢*”. Scalars are complex numbers, and the operations 
of addition, subtraction, and scalar multiplication are performed componentwise. 


The terminology used for n-tuples of real numbers applies to complex n-tuples without change. Thus, if v1, v3, .... Vy, are 
complex numbers, then we call v = (v1, v3, -.., Vy) a vector in C™ and v1, V3, -.., Vy its components. Some examples of 
vectors in (7 are 


u=(1+i, -4i,3+2), v=(0,i,5), w= (6 — y2%,9 zm) 


Every vector 

v= (11, V2, --.. Vn) = (ay + 243, ag + 29, -... 2y + Oy2) 
in C’” can be split into real and imaginary parts as 

V= (41, 42, ....4y) +i(d1, 22, ..., dy) 
which we also denote as 
v=Re(v) +7 Im(v) 
where 
Re(v) = (41,@2,-..@,) and = Im(w) = (41, 5, ..., By) 

The vector 

¥ = (V1, V3, -... Vy) = (ay — byt, ag — 43, -.., @y — Dy 1) 


is called the complex conjugate of v and can be expressed in terms of Re{v) and Im(¥) as 
v= (44, 22, --- ay) _ (44, 52, ed by) = Re(v) -i Im(v) (2) 


It follows that the vectors in R” can be viewed as those vectors in C’* whose imaginary part is zero; or stated another way, a 
vector v in C*" is in R” if and only if ¥ = v. 


In this section we will also need to consider matrices with complex entries, so henceforth we will call a matrix A a real matrix 
if its entries are required to be real numbers and a complex matrix if its entries are allowed to be complex numbers. The 
standard operations on real matrices carry over to complex matrices without change, and all of the familiar properties of 
matrices continue to hold. 


If A is a complex matrix, then Re(A) and Im(A) are the matrices formed from the real and imaginary parts of the entries of A, 
and 4 is the matrix formed by taking the complex conjugate of each entry in A. 


EXAMPLE 1 Real and Imaginary Parts of Vectors and Matrices <4 


Let 


: ‘ 14+: =i 
= (3+i, — 23,5 d A= 
v=(3+i3 7,5) an & ‘af 


Then 
¥= (3 =i, 23,5), Re(w) = (3,0,5), Im(w) = (1, —2,0) 


= fine 4 10 iit 
a-| 4 nel Re(A) =|} A in(A) =| 5 > 


= (1 +7) (6 = 23) = (-2) (4) =8 4+ 8 


-i 
4 6-2 


Algebraic Properties of the Complex Conjugate 


The next two theorems list some properties of complex vectors and matrices that we will need in this section. Some of the 
proofs are given as exercises. 


THEOREM 5.3.1 


If u and v are vectors in (™, and if k is a scalar, then: 
(aq) Y=u 

(b) ku=ka 

(c) UA¥=U+¥ 


(d u—-Vv=u-¥ 


THEOREM 5.3.2 


If A is an 372 x & complex matrix and B is a & x » complex matrix, then: 


(a) A-A 


The Complex Euclidean Inner Product 


The following definition extends the notions of dot product and norm to ¢™. 


DEFINITION 2 


Ifu= (uj, #3, .... %,) and v= (v4, v2, -.., Vy) are vectors in C’™, then the complex Euclidean inner product of of u 
and v (also called the complex dot product) is denoted by y - y and is defined as 


WU VHhVy + uv? +... +R yVy, (3) 


We also define the Euclidean norm on €*" to be 


lvl = yew y pil + ol? + Pal? (4) 


As in the real case, we call v a unit vector in C” if ||w|| = 1, and we say two vectors u and v are orthogonal if y - ¥ = 0. 
The complex conjugates in 3 ensure that ||v|| is a real 


number, for without them the quantity y - y in 4 might 
be imaginary. 


EXAMPLE 2 Complex Euclidean Inner Product and Norm 


Find y+ y, ¥ + uw, ||ull, and ||v|| for the vectors 
u=(1+7,3,3-2) and vw=(1 +3, 2, 42) 


Solution 


u-v=(1+i) (T+) +i(2)+ 3-1) 4) = (1401-3 + 2+ G-1(-4i) = - 2-10 


v-u=(1+i) (142) +24) + 4) G7) =(1 4) -f — 24-413 +i) = —24 10% 
lull = y 1 +a]? + py? + 3a? = ¥ 2-4 14 10 = 13 
llvll = yt +a? + 2 + i = 2444 16 = 22 


=(1 
=(1 


Recall from Table 1 of Section 3.2 that if u and v are column vectors in R”, then their dot product can be expressed as 
u-v=uv=v'u 
The analogous formulas in c” are (verify) 


u-v=uy=7'u (5) 


Example 2 reveals a major difference between the dot product on 2” and the complex dot product on ¢". For the dot product 
on 8” we always have y - y= u- ¥ (the symmetry property), but for the complex dot product the corresponding relationship is 
given by u+ v= ¥~< 14, which is called its antisymmetry property. The following theorem is an analog of Theorem 3.2.2. 


THEOREM 5.3.3 


If u, v, and w are vectors in C™, and if k is a scalar, then the complex Euclidean inner product has the following 
properties: 

(q) wv=VrU [Antisymmetry property] 

(b) a: (V+w) =u'v+u'w [Distributive property] 

(c) K(u-v) = (ku) -v [Homogeneity property] 


(d) u-kv=k(u-v) [Antihomogeneity property] 
(e) v-v>Oand v-v=O0if and onlyif v=0. [Positivity property] 


Parts (c) and (d) of this theorem state that a scalar multiplying a complex Euclidean inner product can be regrouped with the 
first vector, but to regroup it with the second vector you must first take its complex conjugate. We will prove part (d), and 
leave the others as exercises. 


Proof (d) 


ku: v) =k) =k(F WD) =k (vu) = (ev) -u=u- (ky) 


To complete the proof. substitute £ for k and use the fact that ¢ — x. 


Vector Concepts in C” 


Except for the use of complex scalars, the notions of linear combination, linear independence, subspace, spanning, basis, and 
dimension carry over without change to C7”. 


Is 8" a subspace of €’"? Explain. 
Eigenvalues and eigenvectors are defined for complex matrices exactly as for real matrices. If A is an » x 92 matrix with 
complex entries, then the complex roots of the characteristic equation det(A/ — A} = 0 are called complex eigenvalues of A. 
As in the real case, \ is a complex eigenvalue of A if and only if there exists a nonzero vector x in C’™ such that Ax — \x. 
Each such x is called a complex eigenvector of A corresponding to 2. The complex eigenvectors of A corresponding to A are 
the nonzero solutions of the linear system (AZ — .A)x = 0, and the set of all such solutions is a subspace of C™, called the 


eigenspace of A corresponding to i. 


The following theorem states that if a real matrix has complex eigenvalues, then those eigenvalues and their corresponding 
eigenvectors occur in conjugate pairs. 


THEOREM 5.3.4 


If’ is an eigenvalue of a real » % y matrix A, and if x is a corresponding eigenvector, then \ is also an eigenvalue of A, 
and X is a corresponding eigenvector. 


Proof Since 4 is an eigenvalue of A and x is a corresponding eigenvector, we have 


Ax = Ax = AX (6) 


However, 4 — A, since A has real entries, so it follows from part (c) of Theorem 5.3.2 that 


Equations 6 and 7 together imply that 


in which ¥ # 0 (why?); this tells us that \ is an eigenvalue of A and X is a corresponding eigenvector. 


EXAMPLE 3 Complex Eigenvalues and Eigenvectors << 


Find the eigenvalues and bases for the eigenspaces of 


Solution The characteristic polynomial of A is 
A+2 1 

=—5 A=-2 
so the eigenvalues of A are \ —j and \ = ~—j. Note that these eigenvalues are complex conjugates, as 
guaranteed by Theorem 5.3.4. 


=? 41=(A-) (A+) 


To find the eigenvectors we must solve the system 


PAS ata] Fale fo 


with \ —j and then with } — —j;. With \ —j, this system becomes 


i+2 1 x1] 10 (8 
-5 i-2| [*2]~|o ) 
We could solve this system by reducing the augmented matrix 
i+2 1 =O (9) 
—5 i-2 0 
to reduced row echelon form by Gauss-Jordan elimination, though the complex arithmetic is somewhat tedious. 
A simpler procedure here is first to observe that the reduced row echelon form of 9 must have a row of zeros 
because 8 has nontrivial solutions. This being the case, each row of 9 must be a scalar multiple of the other, and 


hence the first row can be made into a row of zeros by adding a suitable multiple of the second row to it. 
Accordingly, we can simply set the entries in the first row to zero, then interchange the rows, and then multiply 


the new first row by -+ to obtain the reduced row echelon form 
1. 
=-a 0 
5! 
0 0 0 
Thus, a general solution of the system is 
xy= (-3 + si} x2=¢ 


This tells us that the eigenspace corresponding to , =; is one-dimensional and consists of all complex scalar 
multiples of the basis vector 


21; 
x=| 5 ' 5" (10) 
1 
As a check, let us confirm that 4, — jx. We obtain 
2. aks 
=—2|-2+z]-1 
-2 -1]/-24+4 ( 5 3] ~1_2,] | 
fz=|o 5 || 35 |= 4 =| 5 5 |= 
1 5(— $4 Gi) +2 i 


We could find a basis for the eigenspace corresponding to , — —j ina similar way, but the work is unnecessary, 


since Theorem 5.3.4 implies that 


221, 
x= a 3 (11) 
1 
must be a basis for this eigenspace. The following computations confirm that ¥ is an eigenvector of A 
corresponding to \—= —j: 
2-1 
= =—2 =1]| -2-z! 
AX = 
: | 5 
1 
2_ 1. 
—2 ( =—— == = i, 2 
5 5 we he SH 
= =|~575'|=-z 
5(-2- 5‘) +2 -i 


Since a number of our subsequent examples will involve 2 x 3 matrices with real entries, it will be useful to discuss some 
general results about the eigenvalues of such matrices. Observe first that the characteristic polynomial of the matrix 


-[ 


i ao - (A=a)(A=d) —be =A — (a +.d)A+ (ad — be) 


A=d 


We can express this in terms of the trace and determinant of A as 


det(A —.A) = 


det(M — A) =A? = tr(A)A + det(A) (12) 
from which it follows that the characteristic equation of A is 
dM — tr(A)A + det(4) =0 (13) 
Now recall from algebra that if axe ++ bx 4+ ¢ = 0 Is a quadratic equation with real coefficients, then the discriminant 
bh? — Age determines the nature of the roots: 
b* —4ac >0 [Two distinct real roots | 
b? —~4ac =0 [One repeated real root] 


b* —4ac <0 [Two conjugate imaginary roots | 
Applying this to 13 with g = 1, = —tr(.A), and c = det(_A) yields the following theorem. 


Olga Taussky-Todd (1906-1995) 


Historical Note Olga Taussky-Todd was one of the pioneering women in matrix analysis and the first woman 
appointed to the faculty at the California Institute of Technology. She worked at the National Physical Laboratory in 
London during World War II, where she was assigned to study flutter in supersonic aircraft. While there, she realized 
that some results about the eigenvalues of a certain § x 6 complex matrix could be used to answer key questions about 
the flutter problem that would otherwise have required laborious calculation. After World War II Olga Taussky-Todd 
continued her work on matrix-related subjects and helped to draw many known but disparate results about matrices 
into the coherent subject that we now call matrix theory. 

[Image: Courtesy of the Archives, California Institute of Technology] 


THEOREM 5.3.5 


If A is a 2 x 2 matrix with real entries, then the characteristic equation of A is Y= tr(A)A + det(.A}) = 0 and 
(a) A has two distinct real eigenvalues if tr(.A)? — 4 det(A) > 0; 
(b) A has one repeated real eigenvalue if tr(.A)? —4 det(A) = 0; 


(c) A has two complex conjugate eigenvalues if tr(A)? —4 det(A) <0. 


EXAMPLE 4 Eigenvalues of a2 x 2 Matrix <@ 


In each part, use Formula 13 for the characteristic equation to find the eigenvalues of 


(a) 4_ 22 
[5 


(b) 4__ 0 1 
[1 

(c) ,_| 2 3 
| 32) 

Solution 


(a) We have tr(.A) = 7 and det(_A} = 12, so the characteristic equation of A is 
\7-7A412=0 
Factoring yields (A — 4) (A — 3} = 0, so the eigenvalues of A are \ = 4 and \ = 3. 
(b) We have tr(.A} = 2 and det(_A) = 1, so the characteristic equation of A is 
A? -2A+1=0 
Factoring this equation yields (. — 1)? = 0), so \ = 1 is the only eigenvalue of A; it has algebraic 
multiplicity 2. 
(c) We have tr(.A} = 4 and det(_4) = 13, so the characteristic equation of A is 
A? 44+ 13=0 
Solving this equation by the quadratic formula yields 


2 
ya 44? =403)_ =3)=32 -ti ee 


2 
Thus, the eigenvalues of A are \ = 2 + 33 and \ = 2 — 3}. 


Symmetric Matrices Have Real Eigenvalues 


Our next result, which is concerned with the eigenvalues of real symmetric matrices, is important in a wide variety of 
applications. The key to its proof is to think of a real symmetric matrix as a complex matrix whose entries have an imaginary 
part of zero. 


THEOREM 5.3.6 


If A is a real symmetric matrix, then A has real eigenvalues. 


Proof Suppose that \ is an eigenvalue of A and x is a corresponding eigenvector, where we allow for the possibility that A is 
complex and x is in c’”. Thus, 


Ax = Ax 
where x + Q. If we multiply both sides of this equation by x! and use the fact that 


x? Ax =x? (Ax) = A(xx] = \(x +x) =Allx||? 
then we obtain 
A= x’ Ax 
lxll? 


Since the denominator in this expression is real, we can prove that A is real by showing that 


x! Ax =x! Ax (14) 


But, A is symmetric and has real entries, so it follows from the second equality in 14 and properties of the conjugate that 


x! Ax = ¥" Ax =x & = (Ax)"x = (x)"x = (Ax)!x=%7ATx = 37 x 


A Geometric Interpretation of Complex Eigenvalues 


The following theorem is the key to understanding the geometric significance of complex eigenvalues of real 2 x 2 matrices. 


THEOREM 5.3.7 


The eigenvalues of the real matrix 


_|a —) 
c=|§ | (15) 


are \ = a + 4;. Ifa and b are not both zero, then this matrix can be factored as 
a =p |A| 9 | cosé —sing 
= ; (16) 
b oa O A} {| sind cos 


where @ is the angle from the positive x-axis to the ray that joins the origin to the point (a, ®) (Figure 5.3.2). 


(a, b) 


Figure 5.3.2 


Geometrically, this theorem states that multiplication by a matrix of form 15 can be viewed as a rotation through the angle @ 
followed by a scaling with factor |A| (Figure 5.3.3). 


h” scaled / Cx 
/ Rotated 


— 
* 


Figure 5.3.3 


Proof The characteristic equation of C is (A— a) 2 +b 2_9 (verify), from which it follows that the eigenvalues of C are 


A\=a + bj. Assuming that a and 5 are not both zero, let @ be the angle from the positive x-axis to the ray that joins the origin 
to the point (a, 4). The angle @ is an argument of the eigenvalue \ = g + i, so we see from Figure 5.3.2 that 


a=|Alcos@ and &=|Ajsing 


It follows from this that the matrix in 15 can be written as 


ce 
a —b)_[Al oO] AT TATI_[AL 2 ]fcosé —sin 
b al |O [All| a a | |0 JAl|| sind cosd 
|A| |A| 


The following theorem, whose proof is considered in the exercises, shows that every real 2 sx 2 matrix with complex 
eigenvalues is similar to a matrix of form 15. 


THEOREM 5.3.8 


Let A be areal 2 % 2 matrix with complex eigenvalues , = g ++ bj (where } = 0). Ifx is an eigenvector of A 
corresponding to \ — g — };, then the matrix P = [Re (x) Im(x) | is invertible and 


a=A(5 les (17) 


EXAMPLE 5 A Matrix Factorization Using Complex Eigenvalues 


Factor the matrix in Example 3 into form 17 using the eigenvalue , — —j and the corresponding eigenvector 
that was given in 11. 


Solution For consistency with the notation in Theorem 5.3.8, let us denote the eigenvector in 11 that 


corresponds to , — —j by x (rather than X as before). For this 4 and x we have 
2 ab 
a=0, b=1, Refx)=| 5], Im(x)=|] ° 
1 0 
Thus, 
2 
P= [ Re(x) Im(x) ] = 5 > 
1 0 


so A can be factored in form 17 as 


Safe Sit als 2] 
5 2 1 0 1 O}{/—5 =2 


You may want to confirm this by multiplying out the right side. 


A Geometric Interpretation of Theorem 5.3.8 


To clarify what Theorem 5.3.8 says geometrically, let us denote the matrices on the right side of 16 by S and &., respectively, 
and then use 16 to rewrite 17 as 


A=PsRgptap| Ml ° |[ ees —sine | 1 
a ’ "10 |A| || sin@ cose (18) 


If we now view P as the transition matrix from the basis 8 = {Re({x), Im(x)} to the standard basis, then 18 tells us that 
computing a product Ax, can be broken down into a three-step process: 


Step 1 Map xg from standard coordinates into B-coordinates by forming the product P ho 
Step 2 Rotate and scale the vector P hig by forming the product §R..P ah 


Step 3 Map the rotated and scaled vector back to standard coordinates to obtain Axg = PSRyP ho: 


Power Sequences 


There are many problems in which one is interested in how successive applications of a matrix transformation affect a specific 
vector. For example, if A is the standard matrix for an operator on 2” and xg is some fixed vector in 8”, then one might be 
interested in the behavior of the power sequence 


xg, AX, A’xy, ... Ax, ... 


For example, if 
1 3 
2 4 1 
A= 301 and xg = 
10 


then with the help of a computer or calculator one can show that the first four terms in the power sequence are 


1 1.25 1.0 0.35 
wali} aen[tes} =| 02) #=| 02] 


With the help of MATLAB or a computer algebra system one can show that if the first 100 terms are plotted as ordered pairs 
(x, y), then the points move along the elliptical path shown in Figure 5.3.4a. 


AY AY 
Xp = (1, 1) 
l we be ow L iv ee is ~ 
s e * ae + 
% Y ” ms s 
$ Ax, ae “ ' 
3 ) : < e ; 
l ; 5° oar 
2 2 > 
1 .,* | le? 
~8A'x, _* *; 
i - i F i 
2 - va : 3 
% . % 7 s 
x ww Way ‘ os s 
. =-|] > oO ~ —~| ~ ~ 
oe ce * we eotloe jee oo? aa 
Ax, 


(a) (b) (c) 


Figure 5.3.4 


To understand why the points move along an elliptical path, we will need to examine the eigenvalues and eigenvectors of A. 
We leave it for you to show that the eigenvalues of A are \ = 2 so Zi and that the corresponding eigenvectors are 


y= 3-2: vi= (5441) and m=S4+ di: v= (3-41) 
If we take A= Ay = : _ Zi andx=v, = $ +i, 1) in 17 and use the fact that |A| = 1, then we obtain the factorization 
i 2 a 
2 4) _ {41} |5 75] 1° ; 
2u 10} |2 2) |) (19) 
A = Pp Ry Pp 


where Xj is a rotation about the origin through the angle @ whose tangent is 


sng@ _ 3/5 _ 3 
cos@ 4/5 4 


The matrix P in 19 is the transition matrix from the basis 


B= (Re(x), Im(x)} = \( 1}, (1, 0 


to the standard basis, and P—! is the transition matrix from the standard basis to the basis B (Figure 5.3.5). Next, observe that 
if n is a positive integer, then 19 implies that 


Avg = (PRP )"xo = PRP“ x, 


so the product A”x, can be computed by first mapping XQ into the point p x5 in B-coordinates, then multiplying by 2”) to 
rotate this point about the origin through the angle ~@, and then multiplying RY P ho by P to map the resulting point back to 


standard coordinates. We can now see what is happening geometrically: In B-coordinates each successive multiplication by A 
causes the point P yo to advance through an angle 9, thereby tracing a circular orbit about the origin. However, the basis B 
is skewed (not orthogonal), so when the points on the circular orbit are transformed back to standard coordinates, the effect is 
to distort the circular orbit into the elliptical orbit traced by A’ x (Figure 5.3.4b). Here are the computations for the first step 
(successive steps are illustrated in Figure 5.3.4c): 


lt 32 a oo 
2 4 [fi] _ 2 5 59 1 ley 
3 uli] ~ |y ofa 4 Ht silt 
5 10 a. 3 
= |2 3 4 1| [xo is mapped to 5 — coordinates . | 
1 0 = 5 2 
= |2 2 | The point(I, 2 )is rotated through the angle 6.| 
1 O}} 1 
a: 
= : | The point | = (>is mapped to standard coordinates . 
2 


Im(x) (1,9) 


Figure 5.3.5 


Concept Review 


Real part of z 


Imaginary part of z 
Modulus of z 


Complex conjugate of z 


Argument of z 


Polar form of z 


Complex vector space 


Complex n-tuple 


Complex n-space 


Real matrix 


Complex matrix 


Complex Euclidean inner product 


Euclidean norm on (*” 


e Antisymmetry property 
e Complex eigenvalue 

e Complex eigenvector 

° Eigenspace in c’” 

e Discriminant 


Skills 

e Find the real part, imaginary part, and complex conjugate of a complex matrix or vector. 
e Find the determinant of a complex matrix. 

e Find complex inner products and norms of complex vectors. 

e Find the eigenvalues and bases for the eigenspaces of complex matrices. 


e Factor a 2 x% 2 real matrix with complex eigenvalues into a product of a scaling matrix and a rotation matrix. 


Exercise Set 5.3 


In Exercises 1—2, find u, Re(u), Im(u), and |ul]. 
1,.u= (2-i,4,14+7) 
Answer: 


U=(2+i, —4i, 1-2), Re (u) = (2,0, 1), Im) =(=1,4, 1), [lull = 23 
2,u= (6, 1 + 42, 6 = 22) 


In Exercises 3-4, show that u, v, and & satisfy Theorem 5.3.1. 

3,u= (3 —4i3,2+7, — 6), v= (1 +i, 2—i,4), k=3 

4,u= (6,14 43,6=—23), v= (4,34 23,i1-3), k= =i 

5. Solve the equation ix — 3v = w for x, where u and v are the vectors in Exercise 3. 
Answer: 


x= (7 =— 67, —4 = &, 6 = 127) 
6. Solve the equation (1 ++ 7}x ++ 2u = ¥ for x, where u and v are the vectors in Exercise 4. 


In Exercises 7-8, find 4, Re(A), Im(A), det(_4), and tr(4). 


7, _[-Si 4 
a=[5% al 


Answer: 


i-| 5 } ss =| | in (4) =| 7? 4} det(A) = 17 =i, (A) =1 


2+? 1=—5i 2 1 
8. 42-33 
A= 

ie "| 


9. Let A be the matrix given in Exercise 7, and let B be the matrix 


[a 


Confirm that these matrices have the properties stated in Theorem 5.3.2. 


10. Let A be the matrix given in Exercise 8, and let B be the matrix 
5i 
B= 
l= | 
Confirm that these matrices have the properties stated in Theorem 5.3.2. 


In Exercises 11-12, compute y - y, u - w, and y - yw, and show that the vectors satisfy Formula 5 and parts ( a), ( 5), and (c) 
of Theorem 5.3.3. 


ll. u= (3, 23,3), v= (4, — 23,142), w= (2-3, 23,54 33), k= 23 
Answer: 


u:v= —1+i3, u-w= 18-7), v-w= 124 63 
12,.u= (1+37,4,33), v=, —4,24+33), w=(1-7,4,4-53), k=1+4: 


13. Compute (u - ¥) —wW~ u for the vectors u, v, and w in Exercise 11. 


Answer: 


-11— 14: 


14. Compute (ru . w) ++ (||u||¥) - u for the vectors u, v, and w in Exercise 12. 


In Exercises 15-18, find the eigenvalues and bases for the eigenspaces of A. 


15.,_[4 —5 
[1-4] 


Answer: 


Ay =2=3, n=|77"} A? =2 +i, n=|? i‘ 


ieee 


4 7 
17. 5 —2 
A= 
bo 
Answer: 


Ay =4 =i, n=|"7'} Ag =4 +i, n=|1T") 


is.,_[ 8 6 
“(532 


In Exercises 19-22, each matrix C has form 15. Theorem 5.3.7 implies that C is the product of a scaling matrix with factor 


|A| and a rotation matrix with angle @. Find |A| and @ for which <7 < 6 <q. 
19. 1 =-1 

C= 

Answer: 


Aj=42. ¢=F 


20.4 [ 0 5 
=[5 6] 


21. 1 3 
c= 
-/3 1 
Answer: 
|A|=2, @= -3 


| -v2 ¥2 


In Exercises 23-26, find an invertible matrix P and a matrix C of form 15 such that 4— pepo. 


oleae 


ae y2 ¥2 


4 7 


Answer: 
—2 —1 3 =—2 
al Eo 


ne 3a 


1 0 
25. 8 6 
A= 
[5 | 
Answer: 


[i hel 


26. 5 =2 
A= 


27. Find all complex scalars k, if any, for which u and v are orthogonal in ¢*?. 
(a) u= (23,7, 33), v= G, 6,4) 
(b) u= (4, 4,147), v=, —1,1—2) 


Answer: 


() k= — 3: 
(b) None 
28. Show that if A is a real »; 5 », matrix and x is a column vector in’, then Re(Ax) = A(Re(x)) and Im(Ax) = A(Im(x)). 


29. The matrices 
wal? 1) gf?) 4.-/! 9 
PEO eae Ree ee es ay 


called Pauli spin matrices, are used in quantum mechanics to study particle spin. The Dirac matrices, which are also used 
in quantum mechanics, are expressed in terms of the Pauli spin matrices and the 2 x 2 identity matrix /3 as 


(a) Show that 97 =a? = ay = a7. 


(b) Matrices A and B for which 43 — — §A are said to be anticommutative. Show that the Dirac matrices are 
anticommutative. 
30. If x is a real scalar and v is a vector in R”, then Theorem 3.2.1 states that ||4v|| = |X|||v|]. Is this relationship also true if k 


is a complex scalar and v is a vector in C**? Justify your answer. 
31. Prove part ( c) of Theorem 5.3.1. 
32. Prove Theorem 5.3.2. 


33. Prove that if u and v are vectors in (”, then 


1 2 1 2 
u:v= —lu+v||*——llu-v 
tha +vi)?— 4 }u—vI 
i 2 2 wd 
+ —lju+iv||" — S]ju—iv 
+ E+ iv? — Iu iI 
34. It follows from Theorem 5.3.7 that the eigenvalues of the rotation matrix 
cos —sind 
Re = | . 4 : | 
sind cos 
are \ = cos@ + ising. Prove that if x is an eigenvector corresponding to either eigenvalue, then Re(x) and Im(x) are 


orthogonal and have the same length. [Note: This implies that P = [Re(x)Im(x) ] is a real scalar multiple of an 
orthogonal matrix. ] 


35. The two parts of this exercise lead you through a proof of Theorem 5.3.8. 


a —b 
ar 
and let u= Re{x) and v = Im(x), so P = [ujv]. Show that the relationship 4x — Ax implies that 
Ax = (au + dv) +i( — bu+ av) 
and then equate real and imaginary parts in this equation to show that 
AP = [Au|Av] = [au+ dv |—du+av] =P 


(a) For notational simplicity, let 


(b) Show that P is invertible, thereby completing the proof, since the result in part (a) implies that 4— PagpP—!. [Hint: If 
P is not invertible, then one of its column vectors is a real scalar multiple of the other, say y — cy. Substitute this into 
the equations 4u = au + dy and Ay = = bu + av obtained in part (a), and show that (1 4 c7)bu = 0. Finally, show 


that this leads to a contradiction, thereby proving that P is invertible. ] 


36. In this problem you will prove the complex analog of the Cauchy-Schwarz inequality. 
(a) Prove: If k is a complex number, and u and v are vectors in C’”, then 


(u—kyv) + (u—kv) = u-u—k(u-v) —k(u- v) + kk (vv) 


(b) Use the result in part (a) to prove that 
O<u-u—dA(u-v) —k(u- v) + kk(v- v) 


(c) Take = (uv) / (w+ v) in part (b) to prove that 
ju- ¥| = |lull [lvl 


True-False Exercises 


In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 


(a) There is a real 5 x 5 matrix with no real eigenvalues. 


Answer: 


False 


(b) The eigenvalues of a 2 x% 2 complex matrix are the solutions of the equation i tr( AJA + det(A} = 0. 


Answer: 


True 


(c) Matrices that have the same complex eigenvalues with the same algebraic multiplicities have the same trace. 
Answer: 


False 


(d) If A is a complex eigenvalue of a real matrix A with a corresponding complex eigenvector v, then \ is a complex 
eigenvalue of A and ¥ is a complex eigenvector of A corresponding to }\. 


Answer: 


True 


(e) Every eigenvalue of a complex symmetric matrix is real. 
Answer: 


False 


(f) If a2 x 2 real matrix A has complex eigenvalues and xg is a vector in 22, then the vectors xp, Axg, Axa, .., A”xg, ... lie 


on an ellipse. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


5.4 Differential Equations 


Many laws of physics, chemistry, biology, engineering, and economics are described in terms of “differential 
equations”—that is, equations involving functions and their derivatives. In this section we will illustrate one way in 
which linear algebra, eigenvalues and eigenvectors can be applied to solving systems of differential equations. 
Calculus is a prerequisite for this section. 


Terminology 


Recall from calculus that a differential equation is an equation involving unknown functions and their derivatives. 
The order of a differential equation is the order of the highest derivative it contains. The simplest differential 
equations are the first-order equations of the form 


y' =ay (1) 


where y = # (x) is an unknown differentiable function to be determined, y! =dy / dx is its derivative, and a is a 


constant. As with most differential equations, this equation has infinitely many solutions; they are the functions of the 
form 


y=ce™ (2) 
where c is an arbitrary constant. That every function of this form is a solution of 1 follows from the computation 
y’ =cae™ =ay 
and that these are the only solution is shown in the exercises. Accordingly, we call 2 the general solution of 1. As an 
example, the general solution of the differential equation y! = 5y is 


y=ce™ (3) 


Often, a physical problem that leads to a differential equation imposes some conditions that enable us to isolate one 
particular solution from the general solution. For example, if we require that solution 3 of the equation y’ =5y 


satisfy the added condition 


y(0) =6 (4) 
(that is, » — 6 when x = Q), then on substituting these values in 3, we obtain 6 = ce) = c, from which we conclude 
that 
y=6e™ 


is the only solution y’ = 5y that satisfies 4. 


A condition such as 4, which specifies the value of the general solution at a point is called an initial condition, and 
the problem of solving a differential equation subject to an initial condition is called an initial-value problem. 


First-Order Linear Systems 


In this section we will be concerned with solving systems of differential equations of the form 


yt = @1y1 4 
yy = 

’ 

Yn = 4yl¥1 


@12¥2 +..-4 
Q@21¥1 + @22¥2 +.--+ G2n¥n 


+ @y2¥2 +...+ 


ain¥n 


ann¥n 


where y; = £4 (x). y2 = f(x), -... Yn =f (x) are functions to be determined, and the 43;'s are constants. In 


matrix notation, 5 can be written as 


' 
1 a@41 42 
y'y|_|@a1 22 
' ay] @y2 --- 
Yn 
or, more briefly as 
A system of differential equations of form 5 is 
called a first-order linear system. 
' 
y =Ap 


- @1y || ¥1 
- 42m || ¥2 
@nn || Yn 


where the notation y’ denotes the vector obtained by differentiating each component of y. 


EXAMPLE 1 _ Solution of a Linear System with Initial Conditions 


(a) Write the following system in matrix form: 
y 
yy 
y's 

(b) Solve the system. 


< 


3y1 
—2y2 
3y3 


(7) 


(c) Find a solution of the system that satisfies the initial conditions y; (0) = 1, y3(0) = 4, and 


y3(0) = —2. 
Solution 
(a) yl ? 
ys |=] 0 
¥ 0 
or 
3 
y'=|0 
0 


—2 


(8) 


(9) 


(5) 


(6) 


(b) Because each equation in 7 involves only one unknown function, we can solve the equations 
individually. It follows from 2 that these solutions are 


vi = cye™ 
y2 = ee 
y3 = xe" 
or, in matrix notation, 
yi Cy ot 
y=|¥2|=| ee (10) 
¥3 a 5x 


(c) From the given initial conditions, we obtain 


0 
1 = yi) =cje =e; 
0 
4 = y2(0) cx =c2 
—2 = y3(0)=cx"=c3 
so the solution satisfying these conditions is 
yr=e™, yo=4e, yg = — 20” 
or, in matrix notation, 
¥1 2 
¥3 
22% 


Solution by Diagonalization 


What made the system in Example | easy to solve was the fact that each equation involved only one of the unknown 
functions, so its matrix formulation, y’ = Ay, had a diagonal coefficient matrix A [Formula 9]. A more complicated 
situation occurs when some or all of the equations in the system involve more than one of the unknown functions, for 
in this case the coefficient matrix is not diagonal. Let us now consider how we might solve such a system. 


The basic idea for solving a system y’ = Ay whose coefficient matrix A is not diagonal is to introduce a new 
unknown vector u that is related to the unknown vector y by an equation of the form y = Pu in which P is an 
invertible matrix that diagonalizes A. Of course, such a matrix may or may not exist, but if it does then we can rewrite 
the equation y' = Ay as 
Pu' = A(Pu) 

or alternatively as 

u’ = (P “AP }u 
Since P is assumed to diagonalize A, this equation has the form 


uw =Du 


where D is diagonal. We can now solve this equation for u using the method of Example 1, and then obtain y by 
matrix multiplication using the relationship y = Pu. 


In summary, we have the following procedure for solving a system y’ = Ay in the case were A is diagonalizable. 


A Procedure for Solving y’ = Ay if A is Diagonalizable 


Step 1. Find a matrix P that diagonalizes A. 

Step 2. Make the substitutions y = Pu and y’ = Pu' to obtain a new “diagonal system” y’ — Jy, where 
D= PsP. 

Step 3. Solve y! — Py. 

Step 4. Determine y from the equation y = Pu. 


EXAMPLE 2 Solution Using Diagonalization << 


(a) Solve the system 
y= Mm + »2 
yy = 41 - 2y2 


(b) Find the solution that satisfies the initial conditions y (0) = 1, y3(0) = 6. 


Solution 


(a) The coefficient matrix for the system is 


[i 


As discussed in Section 5.2, A will be diagonalized by any matrix P whose columns are linearly 
independent eigenvectors of A. Since 


A=-1 <1 
—4 A+2 


the eigenvalues of A are } = 3 and \ = — 3. By definition, 
x] 
x=| x, 
is an eigenvector of A corresponding to , if and only if x is a nontrivial solution of 


Pa seal [a] = [0] 
[a+ “aJEJ=Lo 


Solving this system yields x; =£, x3 =#£, so 


det(XJ — A) = 


|=. +—6 = (A+ 3)(A—2) 


If , — 2, this system becomes 


Thus, 


is a basis for the eigenspace corresponding to , — 2. Similarly, you can show that 


eek 
P2= 
1 
is a basis for the eigenspace corresponding to , — — 3. Thus, 

1 
1s ae 
P= 4 
1 1 


diagonalizes A, and 


0 =—3 
Thus, as noted in Step 2 of the procedure stated above, the substitution 
y = Pu and y’ = Pu’ 


D=pP14Pp= ie | 


yields the “diagonal system” 


n 
i ee F290 uj= 2uy 
u -m=(4 fe or 


From 2 the solution of this system is 


uy =cye" eye 
or u= 
“un =c7e cqe 


so the equation y = Fu yields, as the solution for y, 


vip fa ti] ete” eye — qo 
y= y2 = 4 _3y — 
1 1 || ¢2# eye + ¢2 7 
or 
yi = ce = qe7 = 
(11) 
yg = cye™ +e9e—* 


(b) If we substitute the given initial conditions in 11, we obtain 
1 
ey=—>c2=1 
1 4 2 
cy +eg=6 
Solving this system, we obtain cy = 2, cz =4, so it follows from 11 that the solution satisfying 
the initial conditions is 


yy = 2e* _ et 


ya=2e* + 40-3 


Remark Keep in mind that the method of Example 2 works because the coefficient matrix of the system can be 
diagonalized. In cases where this is not so, other methods are required. These are typically discussed in books 
devoted to differential equations. 


Concept Review 


Differential equation 


Order of a differential equation 


General solution 


Particular solution 


Initial condition 


Initial-value problem 


First-order linear system 


Skills 
e Find the matrix form of a system of linear differential equations. 
e Find the general solution of a system of linear differential equations by diagonalization. 


e Find the particular solution of a system of linear differential equations satisfying an initial condition. 


Exercise Set 5.4 


1. (a) Solve the system 


yi + 472 
2y1 + 3y2 


YI 
3 


(b) Find the solution that satisfies the initial conditions y; (0) = 0, y2(0) = 0. 


Answer: 


(a) yy= eye — 2¢7e* 


yg=ce™ +c28* 
(b) vi =9 


y2=0 


2. (a) Solve the system 


y= wm + 3y2 
yy 4y, + 5y2 


(b) Find the solution that satisfies the conditions y (0) = 2, y2(0) = 1. 


3. (a) Solve the system 


y= 41 + 3 
yy = -2y, + y2 
ye = —2y1 + y3 


(b) Find the solution that satisfies the initial conditions (0) = — 1, y3(0) = 1, y3(0) =0. 


Answer: 


(a) yy = —ce" + c3e" 


yg =cje* + 2c9e” —cxe™* 
3 = 2e7e — exe 

(0) yy= ef = 26" 
yg =e* — 20 4 26% 
yas —2e* + 267 


4. Solve the system 


yi = 4¥1+2y¥2+ 2y3 
yh = 291 +4y2+2y3 
Ys = 21+ 2y2+4y3 


5. Show that every solution of y’ = ay has the form y = ce®*. 


[Hint: Let y = 7 (x) bea solution of the equation, and show that i i(xe ~a% ig constant. ] 


6. Show that if A is diagonalizable and 
v1 
¥2 
Hl & 
Yn 
is a solution of the system y’ = Ay, then each y; is a linear combination of erik 22% saci gant _ where 
Ay, Ag, -.., Ay are the eigenvalues of A. 
7. Sometimes it is possible to solve a single higher-order linear differential equation with constant coefficients by 
expressing it as a system and applying the methods of this section. For the differential equation y" —y' —6y=0 


, show that the substitutions y; = y and y2 = y’ lead to the system 
y= ¥2 
yy = Si+y2 


Solve this system, and use the result to solve the original differential equation. 


Answer: 


3x 


yoce’ + cge 


8. Use the procedure in Exercise 7 to solve y" +’ — 12y = 0. 


9. Explain how you might use the procedure in Exercise 7 to solve y’ — 6y" 4+ 11y’ — 6y =0. Use your 
procedure to solve the equation. 


Answer: 


y=cye™ +c 70" +c3e" 


10. (a) By rewriting 11 in matrix form, show that the solution of the system in Example 2 can be expressed as 


1 
y=ce™ tls eye "| 4 
1 


This is called the general solution of the system. 


(b) Note that in part (a), the vector in the first term is an eigenvector corresponding to the eigenvalue A; = 2, and 
the vector in the second term is an eigenvector corresponding to the eigenvalue Az = —3. This is a special 
case of the following general result: 


Theorem. If the coefficient matrix A of the system y' = Ay is diagonalizable, then the general 
solution of the system can be expressed as 


AQx . An* 


yacye xy 4 09272"*x9 +... +.0,07"x, 


where Aj, Ag, ..., Ay are the eigenvalues of A, and X; is an eigenvector of A corresponding to A; . 


Prove this result by tracing through the four-step procedure preceding Example 2 with 
A 0 ... 0 


O Ag ... O 


D= and P = [x1 |X2|-- Xn] 


0 0 2. Ay 


11. Consider the system of differential equations y’ = Ay, where A isa 2 s¢ 2 matrix. For what values of 


11, @12, 421, @22 do the component solutions y;, (¢}, y(z) tend to zero as ¢ —, 99? In particular, what must be 
true about the determinant and the trace of A for this to happen? 


12. Solve the nondiagonalizable system 
y= v1 + »2 
yy = 2 


True-False Exercises 
In parts (a)—(e) determine whether the statement is true or false, and justify your answer. 


(a) Every system of differential equations y’ = Ay has a solution. 


Answer: 


False 


(b) If x’ = Ax and y’ = Ay, then xX =F. 
Answer: 


False 


(c) If x’ = Ax and y’ = Ay, then (cx + dy)’ = Alex + dy) for all scalars c and d. 


Answer: 


True 


(d) If A is a square matrix with distinct real eigenvalues, then it is possible to solve x’ = Ax by diagonalization. 
Answer: 


True 


(e) If A and P are similar matrices, then y’ = Ay and wu’ = Pu have the same solutions. 


Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


Chapter 5 Supplementary Exercises 


1. (a) Show that if 0 =@ <x, then 


We cos? —sinf 
sin? cos 


has no eigenvalues and consequently no eigenvectors. 


(b) Give a geometric explanation of the result in part (a). 
Answer: 


(b) The transformation rotates vectors through the angle @; therefore, if () < § <q, then no nonzero vector 
is transformed into a vector in the same or opposite direction. 


2. Find the eigenvalues of 


0 1 0 
A-|0 0 1 
k? =3k? 3k 


*(a) Show that if D is a diagonal matrix with nonnegative entries on the main diagonal, then there is a 
matrix S such that 92 — 7. 


(b) Show that if A is a diagonalizable matrix with nonnegative eigenvalues, then there is a matrix S such 


that 92 — 4. 
(c) Find a matrix § such that 92 — 4, given that 
13 1 
A=|0 4 5 
00 9 
Answer: 
(c)}1 1 0 
eae | 
0 0 3 


4, Prove: If A is a square matrix, then A and 47 have the same characteristic polynomial. 


5. Prove: If 4 is a square matrix and p(A) = det(.A/ — A) is the characteristic polynomial of A, then the 
coefficient of ,*—! in p{A) is the negative of the trace of A. 


[i 


7. In advanced linear algebra, one proves the Cayley—Hamilton Theorem, which states that a square matrix 


6. Prove: If 4 « 0, then 


is not diagonalizable. 


A satisfies its characteristic equation; that is, if 
eg bey t+ egd7 +... ey—jA" | +A" =0 
is the characteristic equation of A, then 


epi teyA+cgA7+...4c,-1A" | + A" =0 


Verify this result for 
3 6 Oo -b o 
@ A=|, ,{| © 4=/9 01 
1 =3 3 


In Exercises 8—10, use the Cayley—Hamilton Theorem, stated in Exercise 7. 


8. (a) Use Exercise 18 of Section 5.1 to prove the Cayley—Hamilton Theorem for 2 sx 2 matrices. 
(b) Prove the Cayley—Hamilton Theorem for »z x »; diagonalizable matrices. 
9. The Cayley—Hamilton Theorem provides a method for calculating powers of a matrix. For example, if A 
is a 2 x% 2 matrix with characteristic equation 
co+e,A+A7=0 
then ef + cyA 4+ A? = 0; SO 
A? = —cyA—col 
Multiplying through by 4 yields 4? = — ¢.4? — eA, which expresses 4 in terms of 4? and A, and 
multiplying through by 42 yields 44— — cy Ate co A2, which expresses 44 in terms of 47 and 42. 
Continuing in this way, we can calculate successive powers of A by expressing them in terms of lower 
powers. Use this procedure to calculate A’, A’, At and 4° for 


3 6 
A= 
| 
Answer: 


2_[15 30] ,3 [75 150] ,4 [375 750] ,5 [1875 3750 
: E a fe a ies ee fe fe 


10. Use the method of the preceding exercise to calculate 47 and 44 for 


Oo :19 
A=|0 01 
1 =—3 3 


11. Find the eigenvalues of the matrix 

C7 OF oc. Sy 
G4 09 222m 
ot Oey a 


Answer: 


0, tr(A) 


12. (a) It was shown in Exercise 17 of Section 5.1 that if A is an y 5 matrix, then the coefficient of \” in 
the characteristic polynomial of A is 1. (A polynomial with this property is called monic.) Show that 


the matrix 
000 ..0 eg 
100..0 —c 
ee gs 


~— & 


000... 1 =¢y-; 
has characteristic polynomial 
pid) =cg + cjAt...teya7! +." 


This shows that every monic polynomial is the characteristic polynomial of some matrix. The matrix 
in this example is called the companion matrix of p(A). [Hint: Evaluate all determinants in the 
problem by adding a multiple of the second row to the first to introduce a zero at the top of the first 
column, and then expanding by cofactors along the first column. ] 


(b) Find a matrix with characteristic polynomial 


ptr) =1—-24A7 4347 +44 


13. A square matrix A is called nilpotent if A” = 0 for some positive integer n. What can you say about the 
eigenvalues of a nilpotent matrix? 


Answer: 


They are all 0. 
14. Prove: If A is an » sx » matrix and n is odd, then A has at least one real eigenvalue. 


15. Find a 3 x 3 matrix A that has eigenvalues 4 = 0, 1, and —] with corresponding eigenvectors 


0 1 0 
1], | —1], | 1 
—1 1 1 
respectively. 
Answer: 
1 0 0 
eee 
ee ae 
2 | 
Lis 53 


16. Suppose that a 4 » 4 matrix A has eigenvalues Ay = 1, Ag = — 2, A3 = 3, andAg= —3. 
(a) Use the method of Exercise 16 of Section 5.1 to find det(_A). 
(b) Use Exercise 5 above to find tr(A). 


17. Let A be a square matrix such that 4? — 4. What can you say about the eigenvalues of A? 


Answer: 


They are all 0, 1, or —]. 


18. (a) Solve the system 
ri 
Y= YN+3y2 
ys = 22y1+4y2 


(b) Find the solution satisfying the initial conditions y ; (0) = 5 and y5(0) = 6. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


| CHAPTER 


Inner Product Spaces 


CHAPTER CONTENTS 


6.1. Inner Products 

6.2. Angle and Orthogonality in Inner Product Spaces 
6.3. Gram—Schmidt Process; OR-Decomposition 

6.4. Best Approximation; Least Squares 

6.5. Least Squares Fitting to Data 


6.6. Function Approximation; Fourier Series 


INTRODUCTION 


In Chapter 3 we defined the dot product of vectors in R”, and we used that concept to 
define notions of length, angle, distance, and orthogonality. In this chapter we will 
generalize those ideas so they are applicable in any vector space, not just 8”. We will also 
discuss various applications of these ideas. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


6.1 Inner Products 


In this section we will use the most important properties of the dot product on 8” as axioms, which, if satisfied by the vectors 
in a vector space V, will enable us to extend the notions of length, distance, angle, and perpendicularity to general vector 
spaces. 


General Inner Products 


In Definition 4 of Section 3.2 we defined the dot product of two vectors in 2”, and in Theorem 3.2.2 we listed four 
fundamental properties of such products. Our first goal in this section is to extend the notion of a dot product to general real 
vector spaces by using those four properties as axioms. We make the following definition. 


Note that Definition | applies only to real vector 
spaces. A definition of inner products on complex 
vector spaces is given in the exercises. Since we will 
have little need for complex vector spaces from this 
point on, you can assume that all vector spaces under 
discussion are real, even though some of the theorems 
are also valid in complex vector spaces. 


DEFINITION 1 


An inner product on a real vector space V is a function that associates a real number {u, ¥ \ with each pair of vectors in 
V in such a way that the following axioms are satisfied for all vectors u, v, and w in V and all scalars k. 


1. (u, v} =(v, u} [Symmetry axiom] 

2. (u-b v, Ww} = (u, w} + (v, w) [Additivity axiom] 

3. (xu, vy= k(u, v} [Homogeneity axiom] 

4, {v, v} > 0 and {v, v} =0 if and only if y = 0 [Positivity axiom] 


Areal vector space with an inner product is called a real product space. 


Because the axioms for a real inner product space are based on properties of the dot product, these inner product space axioms 
will be satisfied automatically if we define the inner product of two vectors u and v in 2” to be 

{u, V}=u-v=uyvy + U2v2 + + Un¥y 
This inner product is commonly called the Euclidean inner product (or the standard inner product) on 2” to distinguish it 
from other possible inner products that might be defined on 2”. We call 2” with the Euclidean inner product Euclidean 
n-space. 


Inner products can be used to define notions of norm and distance in a general inner product space just as we did with dot 
products in R”. Recall from Formulas 11 and 19 of Section 3.2 that if u and v are vectors in Euclidean n-space, then norm and 
distance can be expressed in terms of the dot product as 


lvl] =yv-v and d(u, v) = |lu—vl| = ¥(u—v)-(u—v) 


Motivated by these formulas we make the following definition. 


DEFINITION 2 


If V is a real inner product space, then the norm (or length) of a vector v in V is denoted by ||v]| and is defined by 
IIvll = y¥ ¢v, v} 
and the distance between two vectors is denoted by @(u, v) and is defined by 
d(u, v) = |ju— vl] = {u—v,u—v} 


A vector of norm | is called a unit vector. 


The following theorem, which we state without proof, shows that norms and distances in real inner product spaces have many 
of the properties that you might expect. 


THEOREM 6.1.1 


If u and v are vectors in a real inner product space V, and if k is a scalar, then: 
(a) |\¥|| > 0 with equality if and only if y = 0. 

(b) \|Av|| = |A|ll¥ll. 

(c) d(u, v) =a(y, u). 

(ad) d(u, v) > 0 with equality if and only if y = y. 


Although the Euclidean inner product is the most important inner product on 2”, there are various applications in which it is 
desirable to modify it by weighting each term differently. More precisely, if 
W1,W2,--+ Wn 


are positive real numbers, which we will call weights, and if u= (21, #3, ..., %y,) and v= (v4, v3, -... Vy) are vectors in R”, 
then it can be shown that the formula 
{u, Vi=weiyy bW3RQV9 4 6 8 EW ylyVy (1) 
defines an inner product on 2” that we call the weighted Euclidean inner product with weights w 1, w3, .... Wy- 
Note that the standard Euclidean inner product is the 


special case of the weighted Euclidean inner product in 
which all the weights are 1. 


EXAMPLE 1 Weighted Euclidean Inner Product 
Let u= (x1, #2) and v = (v4, v2) be vectors in 22. Verify that the weighted Euclidean inner product 
{u,v} = 3u1V1 + 2uqV2 (2) 
satisfies the four inner product axioms. 


Solution Axiom 1: Interchanging u and v in Formula 2 does not change the sum on the right side, so 


{u, v} a {v, u}. 


Axiom 2: If w= (#1, w2), then 


{u+ v, w} 3(uy + Vy )wy + 2(uz + ¥2)w2 


3(ywy + vywy) + 2(ugw2 + v2"2) 


(3uywy + 2ugwa) + (3vywy + 2v9w2) 


II 


(aw) + (v) 


Axiom 3: 


(0, v) 


3(kuy)v1 + 2(kuz)v2 
K(3u,v1 + 2u2V2) 
ku, v} 


Axiom 4: (v, v} = 3(v1v1) + 2(v9r2) = 3? + 2vé > 0 with equality if and only ify; = yz = 0; that is, if 
and only if y =Q. 


In Example 1, we are using subscripted w's to 
denote the components of thevector w, not the 
weights. The weights are the numbers 3 and 2 in 
Formula 2. 


An Application of Weighted Euclidean Inner Products 


To illustrate one way in which a weighted Euclidean inner product can arise, suppose that some physical experiment has n 
possible numerical outcomes 


*1,%3,---.%y 


and that a series of m repetitions of the experiment yields these values with various frequencies. Specifically, suppose that * 1 
occurs 7 1 times, X32 occurs 7 3 times, and so forth. Since there are a total of m repetitions of the experiment, it follows that 


Fitfates +fr=m 


Thus, the arithmetic average of the observed numerical values (denoted by x) is 


ga eh = Limit Sarat +S nen) (3) 
If we let 
F=> (f1/2--+fn) 
xX = (%1,%3,--.%y) 
Wy = W2=...=Wy=lim 
then 3 can be expressed as the weighted Euclidean inner product 
x=(f,x}=wyf 1x1 +wafaxa+ + + + Wah nxn 


EXAMPLE 2 Using a Weighted Euclidean Inner Product 


It is important to keep in mind that norm and distance depend on the inner product being used. If the inner product 
is changed, then the norms and distances between vectors also change. For example, for the vectors u= (1, 0) and 
v= (0, 1) in R? with the Euclidean inner product we have 


lull = 12 40?=1 
d(u, v) = |lu—vil|=(]0, — DI =¥12 4 (- 12 = 42 


but if we change to the weighted Euclidean inner product 


and 


(u, v} = 3ujv1 + 2ugVv2 
we have 
lal] = (u, uj? = (3(1)(1) + 200) (0)] 1? = ¥3 
and 


d(u,v) = |lu—vi|=((1, -1), (1, -—1))'" 
[3(1) (1) +2(=1)(-1)]'? = 5 


Unit Circles and Spheres in Inner Product Spaces 


If V is an inner product space, then the set of points in V that satisfy 
ijul| =1 


is called the unit sphere or sometimes the unit circle in V. 
EXAMPLE 3 Unusual Unit Circles inR* “4 


(a) Sketch the unit circle in an xy-coordinate system in 2? using the Euclidean inner product 
{u, vy} =u1v1 + u2V2. 
(b) Sketch the unit circle in an xy-coordinate system in 22 using the weighted Euclidean inner product 
1 1 
U, ¥) = Su Vy + He QV2. 
| gHiv1 + 4422 


Solution 
(a) Ifu= (zx, y), then |ful] = fu, uy!/? = yx? x y, so the equation of the unit circle is {x24y?= 1, or, on 
squaring both sides, 
x? 4 y? =1 


As expected, the graph of this equation is a circle of radius 1 centered at the origin (Figure 6.1.1 a). 
(b) ify— (x, y), then ||ul] = {u, uj!/ a y ral 4 cal , So the equation of the unit circle is 


y ral + ” = 1, or, on squaring both sides, 


a a 
4 


9 
The graph of this equation is the ellipse shown in Figure 6.1.15. 


jul] = 1 


x 


(a) The unit circle using 
the standard Euclidean 
inner product. 


(b) The unit circle using 
a weighted Euclidean 
inner product. 


Figure 6.1.1 


Remark It may seem odd that the “unit circle” in the second part of the last example turned out to have an elliptical shape. 
This will make more sense if you think of circles and spheres in general vector spaces algebraically (||u|| = 1} rather than 
geometrically. The change in geometry occurs because the norm, not being Euclidean, has the effect of distorting the space that 
we are used to seeing through “Euclidean eyes.” 


Inner Products Generated by Matrices 


The Euclidean inner product and the weighted Euclidean inner products are special cases of a general class of inner products 
on 8” called matrix inner products. To define this class of inner products, let u and v be vectors in 2” that are expressed in 
column form, and let A be an nvertible 4 x » matrix. It can be shown (Exercise 31) that if y - y is the Euclidean inner product 
on 2”, then the formula 


{u,v} = Au: Av (4) 


also defines an inner product; it is called the inner product on R" generated by A. 


Recall from Table | of Section 3.2 that if u and v are in column form, then y - y can be written as y? y from which it follows 
that 4 can be expressed as 


(u, v = (Av) ? du 


or, equivalently as 


(u, v} =v Al Au (5) 


EXAMPLE 4 Matrices Generating Weighted Euclidean Inner Products 


The standard Euclidean and weighted Euclidean inner products are examples of matrix inner products. The 
standard Euclidean inner product on 2” is generated by the » s¢ » identity matrix, since setting 4 — / in Formula 
4 yields 


(u,v}=u-/v=u-v 


and the weighted Euclidean inner product 
{u, V} = wyeyVy + w2ugV2 + + +t bh WykyVy (6) 


is generated by the matrix 


yw, 0 0 .. 0 
a-| 0 vw, 0... 0 


0 0 0... yw, 


This can be seen by first observing that 47 4 is the » sx » diagonal matrix whose diagonal entries are the weights 
W 1, W3, -... Wy and then observing that 5 simplifies to 6 when A is the matrix in Formula 7. 


(7) 


EXAMPLE 5 Example 1 Revisited <« 


Every diagonal matrix with positive diagonal 
entries generates a weighted inner product. 
Why? 


The weighted Euclidean inner product {u, v} = 3411 ++ 242¥2 discussed in Example 1 is the inner product on 
R2 generated by 
y3 0 
A= 


0 2 


Other Examples of Inner Products 


So far, we have only considered examples of inner products on 2”. We will now consider examples of inner products on some 
of the other kinds of vector spaces that we discussed earlier. 


EXAMPLE 6 An Inner ProductonMnn <4 


If Uand V are » x » matrices, then the formula 


(U, v| =1(U77) (8) 


defines an inner product on the vector space A, (see Definition 8 of Section 1.3 for a definition of trace). This 
can be proved by confirming that the four inner product space axioms are satisfied, but you can visualize why 
this is so by computing 8 for the 2 5 2 matrices 


uy &2 vi V2 
U=[t5 va and v=(s ei 


[v. v| = (U7) =v, buQV2 + U3V3 + N44 


This yields 


which is just the dot product of the corresponding entries in the two matrices. For example, if 
1 2 -1 0 
= v _— 
tol alee al 
{U, 7} = 1(—1) + 2(0) + 3(3) +4(2) = 16 


The norm of a matrix U relative to this inner product is 


1/2 
| = (0, Oy" = fu? + ud + ud + 03 


then 


EXAMPLE 7 The Standard Inner Producton Pp << 


If 

P=ag + ayx t+ ayx” and gq =b9 + 41x+ °° + b,x” 
are polynomials in ?,,, then the following formula defines an inner product on ,, (verify) that we will call the 
standard inner product on this space: 


{P. q}=agbg +121 + ++ + +ayby (9) 


The norm of a polynomial p relative to this inner product is 


Ipll = yep, py}= a2 +a24+ +--+ +a2 


EXAMPLE 8 The Evaluation Inner Producton Ph <4 


If 
p= p(x) =ag+ayx+ ++ + +ayx” and q=q(x) =bp+byx+ «+ + + byx” 


are polynomials in ?,,, and if xg, x4, ..., X», are distinct real numbers (called sample points), then the formula 
(p, q}= p(xg)g(xo) + Piet ++ + + Pye) (10) 


defines an inner product on P,, called the evaluation inner product at xq, X}, .... Xy. Algebraically, this can be 
viewed as the dot product in R” of the n-tuples 

(p(%0), P@1),--» Pn) and (g(xo), g1), --. 9nd) 
and hence the first three inner product axioms follow from properties of the dot product. The fourth inner 
product axiom follows from the fact that 


(p. p}=[p(xo)]? + [pai]? + + + + + [p@x)]? 20 
with equality holding if and only if 
P(xo) = p(x1) =...= p(y) =0 


But a nonzero polynomial of degree n or less can have at most n distinct roots, so it must be that p= 0, which 
proves that the fourth inner product axiom holds. 


The norm of a polynomial p relative to the evaluation inner product is 


Ill = yp. p)=¥ [eGo 1? + (pe)? + - * - + Lp? (11) 


EXAMPLE 9 Working with the Evaluation Inner Product 


Let 3 have the evaluation inner product at the points 
xo= —2, x1=0, andx3=2 
Compute (p, q} and ||p|| for the polynomials p = p(x) = x? and q=g(x)=1+x. 


Solution It follows from 10 and 11 that 
{(p. a}= PC — 2)¢(— 2) + p(0)g(0) + p(2)¢(2) = (4)(— 1) + (02)01) + 4G) =8 


IIpll = ¥ [p(xo)]? + (pr) 1? + [pd 1? = ¥ [eC 291? + [pO]? + [PQ 1? 
= 4? 402 4.42 = 32 =4y2 


CALCULUS REQUIRED 
EXAMPLE 10 An Inner Product on Cla, b] 
Let f = f (x) and g= g(x) be two functions in C'[a, b] and define 
(f, = [ see) dx (12) 


We will show that this formula defines an inner product on C[a, »] by verifying the four inner product axioms 
for functions f = f (x), g= g(x), andh=(x) inC[a, 4]: 


. {f, = [ sme) a= [eso dx=|g,f 
which proves that Axiom | holds. 
: (f+eh} = [v@ + g(x) (x) dx 
= [ F (x)R(x) dx + i a(x)a(x) dx 
a if h} + (g, h} ° 


which proves that Axiom 2 holds. 


3. 
(xf, g) = [ kf (x)e(x) dx =k [ f(x)e(x) dx=Hf, 
aQ a 
which proves that Axiom 3 holds. 
4. Iff = f (x) is any function in C[a, 4], then 
.ty= [re dx>0 (13) 
a 


since f 20x) > 0 for all x in the interval [@, b]. Moreover because fis continuous on [@, ], the equality 


holds in Formula 13 if and only if the function fis identically zero on [a, 4], that is, if and only iff = Q; and 
this proves that Axiom 4 holds. 


CALCULUS REQUIRED 


EXAMPLE 11 NormofaVectorin Cla, b] 


If Ca, 4] has the inner product that was defined in Example 10, then the norm of a function f = f (x) relative 


to this inner product is 
Ifl| =(£, £57 = y [Pe dx (14) 
A 


and the unit sphere in this space consists of all functions f in C[a@, 4] that satisfy the equation 


[re dx=1 


Remark Note that the vector space ,, is a subspace of C’[a, 4] because polynomials are continuous functions. Thus, 
Formula 12 defines an inner product on ?,,. 


Remark Recall from calculus that the arc length of a curve y = # (x) over an interval [a@, 4] is given by the formula 


t= ffi b [Flay] ax (15) 


Do not confuse this concept of arc length with ||f ||, which is the length (norm) of f when f is viewed as a vector in C'[a, 4]. 
Formulas 14 and 15 are quite different. 


Algebraic Properties of Inner Products 


The following theorem lists some of the algebraic properties of inner products that follow from the inner product axioms. This 
result is a generalization of Theorem 3.2.3, which applied only to the dot product on 2”. 


THEOREM 6.1.2 


If u, v, and w are vectors in a real inner product space V, and if k is a scalar, then 


(u,v) + (uw) 
(¢) (1 ¥— WwW) = (a, ¥) — (0, w) 
(d) (u—v, Ww} =(u, W}— v, wh 


Proof We will prove part (b) and leave the proofs ofthe remaining parts as exercises. 


{u, vw} (vw, u} [By symmetry] 
{¥, w} + (w, u} [By additwity] 


{u,v} + (u, Ww} [By symmetry] 


I| 


The following example illustrates how Theorem 6.1.2 and the defining properties of inner products can be used to perform 
algebraic computations with inner products. As you read through the example, you will find it instructive to justify the steps. 


EXAMPLE 12 Calculating with Inner Products 


{u—2v, 3u4 4v\ =(u, 3u4 Av} — (2v, 3u+4 dv} 
= {u, 3u} + {u, 4v\ _ {2v, 3u} _ {2v, 4v\ 
= 3{u, uj | 4fu, v} - 6{v, u} = 8{v, v} 
= 3]lull? + 4fu, v} — 6fu, v} — 8lvil? 
= 3}lull? — 2¢u, v} — 8ilvll? 


Concept Review 


Inner product axioms 


Euclidean inner product 


Euclidean n-space 


Weighted Euclidean inner product 


Unit circle (sphere) 


Matrix inner product 


Norm in an inner product space 


Distance between two vectors in an inner product space 


Examples of inner products 


Properties of inner products 


Skills 
e Compute the inner product of two vectors. 
e Find the norm of a vector. 


e Find the distance between two vectors. 


e Show that a given formula defines an inner product. 


e Show that a given formula does not define an inner product by demonstrating that at least one of the inner product 
space axioms fails. 


Exercise Set 6.1 


1. Let {u, v} be the Euclidean inner product on R2, and let u= (1, 1), v= (3, 2), w= (0, — 1), and & — 3. Compute the 
following. 
(a) (¥ ¥} 
(b) (Ev. w} 
(6) (U+¥.4] 
(d) Il¥ll 
(e) ¢(u, v) 
(f) llu—-vl| 


Answer: 


(a) 5 

(b) —6 
(c) —3 
(d) 13 
(e) ¥5 
(f) 89 


. Repeat Exercise 1 for the weighted Euclidean inner product {u, v} = 2u1v1 + 3u2Vv2. 


wm N 


. Let {u, ¥} be the Euclidean inner product on R2, and letu= (3, — 2), v= (4, 5), w= (—1, 6), andg — — 4. Verify the 
following. 


Sok ie wal 


Answer: 


(a) 2 
(b) 11 
(c) —13 
(d) —8 
(e) 0 


> 


. Repeat Exercise 3 for the weighted Euclidean inner product {u, v} = 4u1v1 + 5u2Vv2. 


nn 


“Let {u, ¥} be the inner product on R? generated by k if and letu= (2, 1), v= (= 1, 1), w= (0, — 1). Compute the 


following. 


(a) (U. ¥} 

(b) (¥. W} 

(c) {u + ¥, w} 
(d) II¥ll 

(e) d(v, w) 
() |v —wll? 


Answer: 


(a) —9 
(b) 1 
(c) —? 
(d) 1 
(e) | 


(f) 1 
: : i 2 1 0 
Repeat Exercise 5 for the inner product on 2 generated by eee, 


7. Compute (u, ¥} using the inner product in Example 6. 


(a). |3 =—2 _|=-1 3 
a=|3 sb ¥=| 1 | 


(bs) _[ 12] __[4 6 
a=| s}¥=|5 | 


Answer: 


(a) 3 
(b) 56 


8. Compute {P, q} using the inner product in Example 7. 
(a) p= —24+<x4 3x?,q=4—7x? 
(b) p= —5+2x +x7,q=3 | Qe Ax 


9. (a) Use Formula 4 to show that (u, v} = 9x11 + 4%2v2 is the inner product on R? generated by 
3 0 
[5] 
(b) Use the inner product in part (a) to compute {u, v} ifu= ( — 3, 2) and v= (1, 7). 
Answer: 
(b) 29 


10. (a) Use Formula 4 to show that 
{u, Vv} = 5ujv] —41v2 — uv; + 10u2v2 


alee) 


is the inner product on 22 generated by 


1 


—_ 


12. 


13. 


14. 


15. 


16. 


17. 


(b) Use the inner product in part (a) to compute {u, v} ifu= (0, — 3) and v= (6, 2). 


. Letu = (uj, w3) and v = (v4, v3). In each part, the given expression is an inner product on R2. Find a matrix that 


generates it. 
(a) (U, ¥} = 3uyv1 + Sugv2 
(b) (U, ¥} =4uyvy + buav2 


Answer: 


(a) | {3 0 
0 ¥5 
(b) |2 9 
ig 
Let P3 have the inner product in Example 7. In each part, find ||p|]. 
(@) p= —24+3x+2x7 
(b) p=4— 3x? 


Let Af have the inner product in Example 6. In each part, find ||_A]|. 


(a) ,_|—2 5 
[73 6 


(b) ,_ fo 0 
-[0 0 


Answer: 
(a) ¥74 
(b) 0 
Let ?3 have the inner product in Example 7. Find @(p, q). 
p=3=—x4+ x?, q=24 5x? 


Let A499 have the inner product in Example 6. Find @ (A, 3). 


Gna, (6 eae 
als abe al 


Answer: 


(a) {105 
(b) 47 


Let 3 have the inner product of Example 9, and let p= 1 + x 4 x? and q=1- 2x2. Compute the following. 


(a) (P. 4} 
(b) |IPIl 
(c) (Pp, q) 


Let 3 have the evaluation inner product at the sample points 


xgo=—1, x1 =0, x3=1, x3=2 


18. 


19. 


20. 


21. 


Find {p, q} and ||p|| for p= x + x? and q=1+ x2. 
Answer: 


(p. 4) =50, Ipll = 6y3 
In each part, use the given inner product on p? to find ||w||, where w= ( — 1, 3). 
(a) the Euclidean inner product 


(b) the weighted Euclidean inner product {u, v} = 3u1v1 + 222, where u= (11, 22) and v= (v4, v2) 


ete) 


Use the inner products in Exercise 18 to find d(u, v) foru= ( — 1, 2) and v = (2, 5). 


(c) the inner product generated by the matrix 


Answer: 


(a) 32 
(b) 3y'5 
(c) 3y13 


Suppose that u, v, and w are vectors such that 
{u, vi =) {v, Ww} = —3, {u, Ww} =5 
lull = 1, IIvll=2, [wll =7 
Evaluate the given expression. 
(a) {u+ v, V+ Ww 
(b) {2v—w, 3u+ aw} 
(c) {u-v— 2w, 4u + v} 


(d) llu+ vil 
(e) ll2~— vl 
(f) llu—2v + 4y|| 


Sketch the unit circle in R? using the given inner product. 


4 16 
(b) (U, ¥} = 2ujvy +ugv2 


(a) (u v| = yyy + sn, 


Answer: 


(a) 


(b) 


23. 


24. 


25. 


26. 


27. 


28. 


Figure Ex-22 


Let u = (1, #2) and v = (v1, v2). Show that the following are inner products on R?2 by verifying that the inner product 
axioms hold. 

(a) {U,V} = 3uyVvy + Suave 

(b) (U, ¥V} =4uyvy +ugv1 +412 + 4ugv2 


Answer: 


QO 1 
-1 0 


Let u = (uj, #2, uz) and v= (v1, v2, v3). Determine which of the following are inner products on 23. For those that are 


For F = | then (V, VS = —2<0, so Axiom 4 fails. 


not, list the axioms that do not hold. 

(a) {U,V} =u1V] + u3V3 

(b) (u, v} = uty? + uve 7 uve 

(c) (U,V) = 2ujvy +.u9v2 + 4u3zVv3 
( 


u, Vv 
u, ¥} = 41V1 —42V2 + U3V3 
Show that the following identity holds for vectors in any inner product space. 
2 2 2 2 
Ifa + ¥||* + [fu — vi] = 2]lull” + 2|}¥l] 

Answer: 
(a) 28 

15 
(b) 0 


Show that the following identity holds for vectors in any inner product space. 


= Apu + v?—2ju—vi? 
(u, v} = im+ vi? — J 1m— vi 


uy uQ V1 V2 F : 
Let 7 = “3 4 and ¥ = v3 val Show that (U7, 7) = wyv1 + u3v3 + u3¥2 + w4vq is not an inner product on M33. 
Calculus required Let the vector space 3 have the inner product 


1 
Pp. 4 =f p(x)q¢(x) dx 


29. 


30. 


31. 
32. 


(a) Find ||p|| for p= 1, P =, and p= x?. 


(b) Find d(p, q) ifp= 1 and q=%. 


Calculus required Use the inner product 


1 
p.al= fi rte) ax 


on P3, to compute {p, q}. 
(a) p=1—x4 x*4 5x7, q=x — 3x? 
(b) p=x—5x7,q=2+ 8x7 


Calculus required \n each part, use the inner product 


1 
f.e)= [ se) ax 


on C’[0, 1] to compute {f, g}. 
(a) £ =cos2ax, g= sin2ax 
(b) f =x, g=e" 


(c) f = tant, g=1 


Prove that Formula 4 defines an inner product on 2”. 


The definition of a complex vector space was given in the first margin note in Section 4.1. The definition of a complex 
inner product on a complex vector space V is identical to Definition | except that scalars are allowed to be complex 
numbers, and Axiom | is replaced by (u, v| =(v¥, u}. The remaining axioms are unchanged. A complex vector space with 


a complex inner product is called a complex inner product space. Prove that if V is a complex inner product space then 
(u, kv} -_ k{u, v}. 


True-False Exercises 


In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 


(a) The dot product on p2 is an example of a weighted inner product. 


Answer: 


True 


(b) The inner product of two vectors cannot be a negative real number. 


Answer: 


False 


(c) {u, ¥ + w} = [v, uh + (w, U}. 


Answer: 


True 


(d) (iu, icv} = ilu, v}. 


Answer: 


True 


(e) If (u, v} = 9, then y= 0 ory=0. 
Answer: 


False 


(f If |Iv||? = 0, then y —0. 
Answer: 


True 


(g) If A is any x » matrix, then (u, v} = Au - Av defines an inner product on 8”. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


6.2 Angle and Orthogonality in Inner Product 
Spaces 


In Section 3.2 we defined the notion of “angle” between vector in R”. In this section we will extend this idea 
to general vector spaces. This will enable us to extend the notion of orthogonality as well, thereby setting the 
groundwork for a variety of new applications. 


Cauchy—Schwarz Inequality 
Recall from Formula 20 of Section 3.2 that the angle @ between two vectors u and v in 8” is 


face (— 
cos Tati } “ 


We were assured that this formula was valid because it followed from the Cauchy—Schwarz inequality 
(Theorem 3.2.4) that 


—_jo—U-¥_i¢-}| 
~ "Tully > (2) 


as required for the inverse cosine to be defined. The following generalization of Theorem 3.2.4 will enable us 
to define the angle between two vectors in any real inner product space. 


THEOREM 6.2.1 Cauchy—Schwarz Inequality 
If u and v are vectors in a real inner product space V, then 


fa, v4] < [lull lvl (3) 


Proof We warn you in advance that the proof presented here depends on a clever trick that is not easy to 
motivate. 


In the case where y = Q the two sides of 3 are equal since {u, ¥} and ||ul| are both zero. Thus, we need only 
consider the case where y + 0. Making this assumption, let 
a= {u, uj, b= 2(u, v}, c= {v, v} 


and let ¢ be any real number. Since the positivity axiom states that the inner product of any vector with itself is 
nonnegative, it follows that 


0< (fu bv, fut v} = (u, ue? | 2fu, v} | , v| 


= gt? +btite 


This inequality implies that the quadratic polynomial g¢* -+ bé 4+ c has either no real roots or a repeated real 
root. Therefore, its discriminant must satisfy the inequality 4 2 _Aac < 0. Expressing the coefficients a, 5, 


and c in terms of the vectors u and v gives 4{u, vy = 4{u, uly, v} = 0 or, equivalently, 


(u, vy =< (u, uly, v} 
Taking square roots of both sides and using the fact that (u, u} and {v, v} are nonnegative yields 


(x, v} 1/2 


which completes the proof. 


<u, uj!¢y, vj = |lull|l¥ll 


orequivalently (x. v} 


The following two alternative forms of the Cauchy—Schwarz inequality are useful to know: 


{u, vy =< (u, u}{v, v} (4) 


fu, v}? < |lull?|l¥il? (5) 


The first of these formulas was obtained in the proof of Theorem 6.2.1, and the second is a variation of the 
first. 


Angle Between Vectors 


Our next goal is to define what is meant by the “angle” between vectors in a real inner product space. As the 
first step, we leave it for you to use the Cauchy—Schwarz inequality to show that 


u, Vv 
-1<—— <1 6 
~ |fulliivll ~ ©) 


This being the case, there is a unique angle @ in radian measure for which 


Vv 


u, 
cos = 
lal II¥ ll 


and O<@<a (7) 


(Figure 6.2.1). This enables us to define the angle 0 between u and v to be 


_ —| {u, Vv} 
—— Tull | 6) 


y 


Figure 6.2.1 


EXAMPLE 1 Cosine of an Angle Between Two Vectors in Rt 


Let 24 have the Euclidean inner product. Find the cosine of the angle g between the vectors 
u= (4,3, 1, —2) andv= ( —2, 1, 2, 3). 


Solution We leave it for you to verify that 


jul] = 730, [Iv] =¥18, and (u,vj)=—9 


from which it follows that 


{vp 9 3 


Yaya 


cos#= 


Properties of Length and Distance in General Inner Product Spaces 


In Section 3.2 we used the dot product to extend the notions of length and distance to R”, and we showed that 
various familiar theorems remained valid (see Theorem 3.2.5, Theorem 3.2.6, and Theorem 3.2.7). By making 
only minor adjustments to the proofs of those theorems, we can show that they remain valid in any real inner 
product space. For example, here is the generalization of Theorem 3.2.5 (the triangle inequalities). 


THEOREM 6.2.2 


If u, v, and w are vectors in a real inner product space V, and if x is any scalar, then: 
(a) \|ju-+ v|| < ||ul| + |]v]] [Triangle inequality for vectors] 


(b) d(u, v) <d(u, w) + a Ow, v) [Triangle inequality for distances] 


Proof (a) 


\ju + vl? =(u+v, u+v¥} 
(u, uj | 2(u, sD (v, v} 

<(u, us+ iu, vi] + (v, vj [Property of absolute value | 
<(u, " + 2|fulliivll-+¢v. ¥) [By @)] 

[fa]? + 2 [ful |lvll + llvll? 

= (full + IIvil)? 


Taking square roots gives ||u ++ ¥|| < |u|] + ||v]]. 


Proof (b) Identical to the proof of part (b) of Theorem 3.2.5. 


Orthogonality 


Although Example | is a useful mathematical exercise, there is only an occasional need to compute angles in 
vector spaces other than 2 and 22. A problem of more interest in general vector spaces is ascertaining 
whether the angle between vectors is z / 2. You should be able to see from Formula 8 that if u and v are 
nonzero vectors, then the angle between them is § = g / 2 if and only if {u, v} = 0. Accordingly, we make the 
following definition (which is applicable even if one or both of the vectors is zero). 


DEFINITION 1 


Two vectors u and v in an inner product space are called orthogonal if {u, v} = 0. 


As the following example shows, orthogonality depends on the inner product in the sense that for different 
inner products two vectors can be orthogonal with respect to one but not the other. 


EXAMPLE 2 Orthogonality Depends on the Inner Product 


The vectors u= (1, 1) and v= (1, — 1) are orthogonal with respect to the Euclidean inner 
product on 22, since 
u-v=(1)(1)+€1)(—1) =0 
However, they are not orthogonal with respect to the weighted Euclidean inner product 
(u, v} = 3411 + 2422, since 
fu, v} = 3(1)(1) + 201)(— 1) =140 


EXAMPLE 3 Orthogonal Vectors in M22 «@ 


If 42 has the inner product of Example 6 in the preceding section, then the matrices 
1 0 0 2 
v= d V= 


(U, Vy = 1(0) +0(2) + 1(0) + 1(0) =0 


are orthogonal, since 


CALCULUS REQUIRED 


EXAMPLE 4 Orthogonal VectorsinP2 << 


Let P3 have the inner product 


and let P= X and gq = x*. Then 


' 1/2 i 1/2 
Well =(p. py? =| f xx dx -| vax) =f2 
—l -1 3 
1 1/2 ; 1/2 
Nall = (a. 9)! =| f x2x? dx =|] xtax| = /2 
-1 —1 a 
; 2 ; 3 
{P. q} =| xXx x= | x dx=0 
| —| 


Because {p, q} = 0, the vectors P = X and q = x“ are orthogonal relative to the given inner 


product. 


In Section 3.3 we proved the Theorem of Pythagoras for vectors in Euclidean n-space. The following theorem 
extends this result to vectors in any real inner product space. 


THEOREM 6.2.3 Generalized Theorem of Pythagoras 


If u and v are orthogonal vectors in an inner product space, then 


2 2 2 
l[u + vil" = [ull + II 


Proof The orthogonality of u and v implies that {u, v} = 9, so 


||u + ¥|| = (ut+v, ut+vj= ij] + 2(u, v} + hale 


2 2 
lull” + Ill 


CALCULUS REQUIRED 


EXAMPLE 5 Theorem of PythagorasinP2 << 


In Example 4 we showed that P = x and g = x? are orthogonal with respect to the inner product 


1 
(P. a= f p(x)q(x) dx 
on 3. It follows from Theorem 6.2.3 that 
2 2 2 
lp + all“ = Ilpll* + llall 


Thus, from the computations in Example 4, we have 


2 2 
2/42 2) 2,2_16 


We can check this result by direct integration: 


Ip +all?=(p+ap4 y=, (xb x7\ (x +x? | dx 


1 1 1 
-/ Pds+2f as | x§dx=24042=10 


Orthogonal Complements 


In Section 4.8 we defined the notion of an orthogonal complement for subspaces of 2”, and we used that 
definition to establish a geometric link between the fundamental spaces of a matrix. The following definition 
extends that idea to general inner product spaces. 


DEFINITION 2 


If W is a subspace of an inner product space V, then the set of all vectors in V that are orthogonal to 
every vector in W is called the orthogonal complement of W and is denoted by the symbol f+. 


In Theorem 4.8.8 we stated three properties of orthogonal complements in 8”. The following theorem 
generalizes parts (a) and (b) of that theorem to general inner product spaces. 


THEOREM 6.2.4 


If Wis a subspace of an inner product space V, then: 
(a) WW is a subspace of V. 
(b) Waw+ = {0}. 


Proof (a) The set }#”+ contains at least the zero vector, since (0, w)= 0 for every vector w in W. Thus, it 
remains to show that }f+ is closed under addition and scalar multiplication. To do this, suppose that u and v 
are vectors in Jf”, so that for every vector w in W we have {u, w} = 0 and {v, w} = 0. It follows from the 
additivity and homogeneity axioms of inner products that 


{u } v, Ww} ={u, w} f (v, w} = 0 -O=0 
(Xu, w\ = k(u, w} = £(0) =0 


which proves that y + y and jy are in }f” L 


Proof (b) Ifv is any vector in both W and }# +, then v is orthogonal to itself; that is, {v, v} = 0. It follows 
from the positivity axiom for inner products that y = Q. 


The next theorem, which we state without proof, generalizes part (c) of Theorem 4.8.8. Note, however, that 
this theorem applies only to finite-dimensional inner product spaces, whereas Theorem 6.2.5 does not have 
this restriction. 


THEOREM 6.2.5 


Theorem 6.2.5 implies that in a finite- 
dimensional inner product space 
orthogonal complements occur in pairs, 
each being orthogonal to the other (Figure 
6.2.2). 


Theorem 6.2.5 If W is a subspace of a finite-dimensional inner product space V, then the orthogonal 
complement of }f’- is W; that is, 


Figure 6.2.2. Bach vector in W is orthogonal to each vector in w+ and conversely 


In our study of the fundamental spaces of a matrix in Section 4.8 we showed that the row space and null space 
of a matrix are orthogonal complements with respect to the Euclidean inner product on 2” (Theorem 4.8.9). 
The following example takes advantage of that fact. 


EXAMPLE 6 Basis for an Orthogonal Complement 


Let W be the subspace of 2 spanned by the vectors 
w, =(1,3, —2,0, 2,0), w= (2,6, —5, —2,4, —35), 
w; =(0,0,5, 10,0, 15), w= (2, 6, 0, 8, 4, 18) 


Find a basis for the orthogonal complement of W. 


Solution The space W is the same as the row space of the matrix 


13 =-2 02 0 
26 =—5 =<2 4 =3 
Ot 3 10-0 13 
26 0 84 18 


A= 


Since the row space and null space of A are orthogonal complements, our problem reduces to 
finding a basis for the null space of this matrix. In Example 4 of Section 4.7 we showed that 


—3 =] =—2 

1 0 0 

0 —2 0 

I=] of Y2=! a) VB=] o 
0 0 1 

0 0 0 


form a basis for this null space. Expressing these vectors in comma-delimited form (to match 
that of w , Ww, W3, and 4), we obtain the basis vectors 
vy =(=—3,1,0,0,0,0), wo=(—4,0, =—2,1,0,0), we=(—2,0,0,0, 1,0) 


You may want to check that these vectors are orthogonal to W1, W2, W3, and W4 by computing 
the necessary dot products. 


Concept Review 

° Cauchy—Schwarz inequality 
e Angle between vectors 

e Orthogonal vectors 


e Orthogonal complement 


Skills 


e Find the angle between two vectors in an inner product space. 
e Determine whether two vectors in an inner product space are orthogonal. 


e Find a basis for the orthogonal complement of a subspace of an inner product space. 


Exercise Set 6.2 


1. Let 22, 23, and 74 have the Euclidean inner product. In each part, find the cosine of the angle between u 
and v. 
(a) u= (1, =—3), v= (2,4) 
(b) u=(=—1,0), v= (3,8) 
(c) u=(=—1,5,2), v= (2,4, =—9) 
(d) u= (4,1,8), v= (1,0, =—3) 
(e) u= (1,0, 1,0), v=(-3, -3, -3, —3) 
() u=(2,1,7, -1), v= (4,0, 0,0) 


Answer: 
(a) — 1 
2 
a 
73 
(c) 0 
(d) ~20_ 
9¥'10 
(ce) —L 
2 
(f) —2_ 


2. Let Pz have the inner product in Example 7 of Section 6.1 . Find the cosine of the angle between pand q. 
(a) p= —14 5x + 2x7, q=244x— 9x? 
(bt) p=x—x*2,q=7 + 3x + 3x7 


3. Let Af have the inner product in Example 6 of Section 6.1 . Find the cosine of the angle between A and 
B. 


@y gle. 5 ie 2 
a-[) 5} [1 0. 


(b) ,_| 2 4] ,_[-3 1 
a=|_3 3}2=| 4 | 


Answer: 


(a) _19 
10y7 
(b) 0 
4. In each part, determine whether the given vectors are orthogonal withrespect to the Euclidean inner 
product. 
(a) u=(—1, 3,2), v= (4, 2, — 1) 
(b) u=(—2, —2, —2), v=(1, 1, 1) 
(c) u= (u1, #2, U3), v= (0, 0, 0) 
(d) u=(—4, 6, — 10, 1), v= (2, 1, —2, 9) 
(ce) = (0,3, =—2, 1), v= (5, 2, = 1, 0) 
(f) u= (4,4), v=(—4,a) 


5. Show that p= 1 — x + 2x? and q= 2x + x? are orthogonal with respect to the inner product in Exercise 


2 
6. Let 
2 4 
gen 
Which of the following matrices are orthogonal to A with respect to the inner product in Exercise 3? 
(a) —3 | 
0 2 


(b) {1 1 
0 -1 


7. Do there exist scalars k and / such that the vectors u= (2, &, 6), v= (2, 5, 3), and w= (1, 2, 3) are 
mutually orthogonal with respect to the Euclidean inner product? 


Answer: 


No 
8. Let 27 have the Euclidean inner product, and suppose that u= (1, 1, — 1) andv= (6,7, — 15). Finda 
value of k for which ||4u + v|| = 13. 
9. Let R? have the Euclidean inner product. For which values of k are u and v orthogonal? 
(a) u= (2, 1,3), v= (1, 7,4) 
(b) u= (Kk, &, 1), v= &, 5, 6) 


Answer: 


(a) k= —3 
(b) k= —2, —3 

10. Let 24 have the Euclidean inner product. Find two unit vectors that are orthogonal to all three of the 
vectorsu= (2, 1, =—4,0),vw=(=—1, =—1, 2, 2), and w= (3, 2, 5, 4). 


11. In each part, verify that the Cauchy—Schwarz inequality holds for the given vectors using the Euclidean 
inner product. 


(a) u= (3,2), v= (4, —1) 
(b) u=(—3, 1,9), v= (2, — 1, 3) 
(c) w= (4, 2, 1), v= (8, —4, —2) 
(d) u= (0, —2, 2,1), v=(—1, —1,1, 1) 
12. In each part, verify that the Cauchy—Schwarz inequality holds for the given vectors. 


(a) u= (= 2, 1) and v= (1, 0) using the inner product of Example | of Section 6.1 . 


(b) i= ie | and Y= E | using the inner product in Example 6 of Section 6.1 . 


(c) p= —1+4+2x4 x? and q=2- Ax? using the inner product given in Example 7 of Section 6.1 . 


13. Let 24 have the Euclidean inner product, and let u = ( = 1, 1, 0, 2). Determine whether the vector u is 


orthogonal to the subspace spanned by the vectors wy = (0, 0, 0,0), wg = (1, —1, 3, 0), and 
w3 = (4,0, 9, 2). 


Answer: 


No 


In Exercises 14-15, assume that 2” has the Euclidean inner product. 


14. Let W be the line in R2 with equation y = 2x. Find an equation for }” L 


15. (a) Let W be the plane in R2 with equation x — 2y — 3z = 0. Find parametric equations for py +. 
(b) Let W be the line in 27 with parametric equations 
x=2t, po =3t, z=4¢ 
Find an equation for }¥ +. 
(c) Let Wbe the intersection of the two planes 
x+y+z=0 and x—y+z=0 
in 23. Find an equation for jy +. 


Answer: 


(a) x=t, y= — 24, z= = 34 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 
27. 


(b) 2x —5y +4z=0 

(c) x-Zz=0 

Find a basis for the orthogonal complement of the subspace of R” spanned by the vectors. 

(a) vy = (1, = 1,3), v2 = (5, =—4, =—4), v3 = (7, =—6, 2) 

(b) vy = (2,0, = 1), v3 = (4,0, = 2) 

(c) vy = (1, 4, 5, 2), v2 = (2, 1, 3, 0), v3 = (= 1, 3, 2, 2) 

(d) vy = (1, 4, 5, 6, 9), v3 = (3, = 2, 1,4, = 1), v3 = (=—1,0, =—1, = 2, = 1), v4= (2, 3, 5,7, 8) 


Let V be an inner product space. Show that if u and v are orthogonal unit vectors in V, then ||u — v|| = y2 


Let V be an inner product space. Show that if w is orthogonal to both Uj and U3, then it is orthogonal to 
kyuy + & ug for all scalars &; and £3. Interpret this result geometrically in the case where V is R? with 
the Euclidean inner product. 


Let V be an inner product space. Show that if w is orthogonal to each of the vectors uj, u3, -.., Uy, then it 
is orthogonal to every vector in span {uj, U3, ..., Uy} . 


Let {¥1, V2, ..., ¥»} bea basis for an inner product space V. Show that the zero vector is the only vector 
in V that is orthogonal to all of the basis vectors. 


Let {wy , w2,.... Wj} bea basis for a subspace W of V. Show that }”+ consists of all vectors in V that are 
orthogonal to every basis vector. 


Prove the following generalization of Theorem 6.2.3: If vj, v3, ..., Vy are pairwise orthogonal vectors in 
an inner product space V, then 


2 2 2 2 
Wiha ss yl = [vill + llvall + - - > + Iiyell 
Prove: If u and v are » x | matrices and A is an » x »% matrix, then 


vi Al Au *< uw) A? Au\ fv" A? Av 
[y7AT én) < (a"A7a)(v"A Ae) 


Use the Cauchy—Schwarz inequality to prove that for all real values of a, b, and @, 


(acosO + bsind)* <a? +b? 


Prove: If w1, W3, -.., Wy are positive real numbers, and ifu = (21, #3, ..., %y,) and v= (v4, V3, -... Vy) 
are any two vectors in 8”, then 
JwryieqVy be woNQVA + tt Wyn Vy| 
1/2 1/2 
= rin] | wus reer nla} brivj + wyv3 fer wv} 


Show that equality holds in the Cauchy—Schwarz inequality if and only if u and v are linearly dependent. 
Use vector methods to prove that a triangle that is inscribed in a circle so that it has a diameter for a side 
must be a right triangle. [Hint: Express the vectors 4 and px in the accompanying figure in terms of u 
andv. | 


B 


y 


Figure Ex-27 


28. As illustrated in the accompanying figure, the vectors u = (1, V3 } and ¥ = (- 1; 3 } have norm 2 and 


an angle of 60° between them relative to the Euclidean inner product. Find a weighted Euclidean inner 
product with respect to which u and v are orthogonal unit vectors. 


Figure Ex-28 


29. Calculus required Let # (x) and g(x) be continuous functions on [0, 1]. Prove: 


(a) 1 2 1 1 

. | | Sets) | <| | f(x) “al | e2(x) | 

(b) 2 
[[y@rsor a] <[ [7045] 


[Hint: Use the Cauchy—Schwarz inequality. ] 


1 1/2 
+| J e2(x) | 
0 


30. Calculus required Let C'[0, 7] have the inner product 
Fe}= [sec as 
and let f,, = cosmx (#2 =0, 1, 2, ...). Show that if & z j, then f;, and f; are orthogonal vectors. 


31. (a) Let W be the line y = x in an xy-coordinate system in 22. Describe the subspace }#+. 
(b) Let W be the y-axis in an xyz-coordinate system in 27. Describe the subspace }f’+. 
(c) Let W be the yz-plane of an xyz-coordinate system in 22. Describe the subspace Jf’ +. 


Answer: 


(a) The line y= = 
(b) The xz-plane 


(c) The x-axis 


32. Prove that Formula 4 holds for all nonzero vectors u and v in an inner product space V. 


True-False Exercises 
In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 
(a) If u is orthogonal to every vector of a subspace W, then y = Q. 

Answer: 


False 


(b) If u is a vector in both W and jy +, then y — 0. 
Answer: 


True 


(c) If u and v are vectors in Jf +, then y + y is in +. 
Answer: 


True 


(d) If u is a vector in J+ and k is a real number, then jy is in} +. 
Answer: 


True 


(e) If u and v are orthogonal, then |{u, ¥}| = ||u||||¥|]. 
Answer: 


False 


(f) If u and v are orthogonal, then ||u + || = |u| + ||v||. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


6.3 Gram—Schmidt Process; QR-Decomposition 


In many problems involving vector spaces, the problem solver is free to choose any basis for the vector space that 
seems appropriate. In inner product spaces, the solution of a problem is often greatly simplified by choosing a basis 
in which the vectors are orthogonal to one another. In this section we will show how such bases can be obtained. 


Orthogonal and Orthonormal Sets 


Recall from Section 6.2 that two vectors in an inner product space are said to be orthogonal if their inner product is 
zero. The following definition extends the notion of orthogonality to sets of vectors in an inner product space. 


DEFINITION 1 


A set of two or more vectors in a real inner product space is said to be orthogonal if all pairs of distinct 
vectors in the set are orthogonal. An orthogonal set in which each vector has norm | is said to be 
orthogonal. 


EXAMPLE 1 An Orthogonal Set in R < 


Let 
uj = (0,1,0), ug=—(1,0,1), uz=—(1,0, =—1) 


and assume that 2 has the Euclidean inner product. It follows that the set of vectors 
S= {uy, uz, uz} is orthogonal since (uy, uz} = (uy, uz} = (uz, uz} = 9. 


If v is a nonzero vector in an inner ae space, then it follows from Theorem 6.1.1 with & = ||w|| that 


Ilfnq ¥ll = Til ¥||= 7,7 I¥ll =1 
Pl IIvl Pi 


from which we see that multiplying a nonzero vector by the reciprocal of its norm produces a vector of norm 1. This 
process is called normalizing v. It follows that any orthogonal set of nonzero vectors can be converted to an 
orthonormal set by normalizing each of its vectors. 


EXAMPLE 2 Constructing an Orthonormal Set << 


The Euclidean norms of the vectors in Example 1 are 


lui=1, luall=72,. llusil =y2 


Consequently, normalizing Uj, U2, and uz yields 


“ap 


We leave it for you to verify that the set S= {v ,, v2, v3} is orthonormal by showing that 
(¥1, ¥2}=(¥1, ¥3}=(¥2,¥3}=0 and [lvi || =Ilvall =Ilv3ll =1 


In R2 any two nonzero perpendicular vectors are linearly independent because neither is a scalar multiple of the 
other; and in 7 any three nonzero mutually perpendicular vectors are linearly independent because no one lies in 
the plane of the other two (and hence is not expressible as a linear combination of the other two). The following 
theorem generalizes these observations. 


THEOREM 6.3.1 


If S= {v 1, ¥3, ..., ¥,} 1s an orthogonal set of nonzero vectors in an inner product space, then S is linearly 
independent. 


Proof Assume that 


kyvy tkovg+ +++ +kyv,=0 
To demonstrate that S= {v1, ¥2, ..., ¥,} is linearly independent, we must prove that k; = kz =...=k,=0. 
For each ¥; in S, it follows from 1 that 
(kyvy +kov2+ + + + +kyvy, vi} = (0, vj} =0 
or, equivalently, 
Ky{vi, vi} + ka(vo, wi} + + + + +kylvy, vj} = 9 


From the orthogonality of S it follows that (vj, ¥;} = 9 when j #3, so this equation reduces to 

Ril ¥y, v;} = 0 
Since the vectors in S' are assumed to be nonzero, it follows from the positivity axiom for inner products that 
(vj, Vj} # 0. Thus, the preceding equation implies that each &, in Equation | is zero, which is what we wanted to 
prove. 


Since an orthonormal set is orthogonal, and since 
its vectors are nonzero (norm 1), it follows from 
Theorem 6.3.1 that every orthonormal set is 
linearly independent. 


In an inner product space, a basis consisting of orthonormal vectors is called an orthonormal basis, and a basis 


(1) 


consisting of orthogonal vectors is called an orthogonal basis. A familiar example of an orthonormal basis is the 
standard basis for R” with the Euclidean inner product: 
e, = (1, 0, 0,...0), e2=(0,1,0,..,0),.... e,= (0,0, 0,.., 1) 


EXAMPLE 3 AnOrthonormal Basis 


In Example 2 we showed that the vectors 


vj =(0,1,0), w= Fa 0, +5} and v3= Fa 0, -4| 


form an orthonormal set with respect to the Euclidean inner product on 27. By Theorem 6.3.1, these 
vectors form a linearlyindependent set, and since 2? is three-dimensional, it follows from Theorem 


4.5.4 that S= {v1, ¥2, ¥3} is an orthonormal basis for 27. 


Coordinates Relative to Orthonormal Bases 


One way to express a vector u as a linear combination of basis vectors 
s= {¥1, V2, --- Vy} 


is to convert the vector equation 

UScyvy HCQV2 Ft HeyVy 
to a linear system and solve for the coefficients ¢1, ¢3, ..., ¢,. However, if the basis happens to be orthogonal or 
orthonormal, then the following theorem shows that the coefficients can be obtained more simply by computing 
appropriate inner products. 


THEOREM 6.3.2 


(a) IfS= {¥ 1, v2, -.., ¥,} is an orthogonal basis for an inner product space V, and if u is any vector in V, 
then 


= 1 2 
2 2 2 
II¥ill II¥all II¥nll 


b) Wf S= {wy, v9, -..,¥ is an orthonormal basis for an inner product space V, and if u is any vector in V, 
1 2 n y 
then 


u=(u, vy }vy + (u, vg}w2+ + + + + (U, vy bvy (3) 


Proof (a) Since S= {v1, V2, ..., ¥,} is a basis for V, every vector u in V can be expressed in the form 


US—cyvy HegVg +t ttt HF eyV¥y 


We will complete the proof by showing that 


{U, Vi} 


= 2 (4) 
II¥ill 
fori = 1, 2, ..., x. To do this, observe first that 
{u, vj} = {e1¥1 + cgvg2+ * + * +Cy¥y, vj} 
=e1(v1, vi} +c2{v2, Vi} + + + + en Vn, Vi) 
Since S is an orthogonal set, all of the inner products in the last equality are zero except the ith, so we have 
2 
(u, v;| = ei Vi v;| =c;|lv;l 
Solving this equation for ¢; yields 4, which completes the proof. 
Proof (6) In this case, ||w4|| = ||¥2|| =-..= ||v,|| = 1, so Formula 2 simplifies to Formula 3. 
Using the terminology and notation from Definition 2 of Section 4.4, it follows from Theorem 6.3.2 that the 
coordinate vector of a vector u in V relative to an orthogonal basis S= {¥ 1, ¥3, ..., Vy} IS 
_ f{uvi} (u, v2} {u, Yn} 
(u) 9= 3 > 3 pees 3 (5) 
Ivill” — Ilvall II¥ll 
and relative to an orthonormal basis S= {¥1, ¥3, ..., Vy} is 
(u) ¢= ({u, V1 \, {u, V2}, es | {u, ¥n}) (6) 


EXAMPLE 4 ACoordinate Vector Relative to an Orthonormal Basis 
Let 


a | 


It is easy to check that S= {wv 1, ¥2, ¥3} is an orthonormal basis for 2 with the Euclidean inner product. 


n=01.0% w= (-3 0, at we a 0, ) 


Express the vector u= (1, 1, 1} as a linear combination of the vectors in S, and find the coordinate vector 
(u) 5. 


Solution We leave it for you to verify that 


u, ¥j}=1, (u, v2} = i, and (u, vs} = 3 
Therefore, by Theorem 6.3.2 we have 
u=Vv{ ~ivn4 tvs 
that is, 
(1, 1, 1) = (0, 1, 0) -3(- 40, 3} + AG 0, 


Thus, the coordinate vector of u relative to S is 


(a) s= ((u, v1), (0, ¥2), (w ¥3)) = (1, - 3, 3) 


EXAMPLE 5 An Orthonormal Basis from an Orthogonal Basis 


(a) Show that the vectors 
w, =(0,2,0), wo=(3,0,3), w3=(—4, 0,4) 
form an orthogonal basis for 27 with the Euclidean inner product, and use that basis to find an 
orthonormal basis by normalizing each vector. 


(b) Express the vector u= (1, 2, 4) as a linear combination of the orthonormal basis vectors obtained 
in part (a). 


Solution 
(a) The given vectors form an orthogonal set since 
{w1, W2} = 0, {w1, W3} = 0, {w2, W3} =0 


It follows from Theorem 6.3.1 that these vectors are linearly independent and hence form a basis 
for R3 by Theorem 4.5.4. We leave it for you to calculate the norms of w,, w2, and W3 and then 


obtain the orthonormal basis 


vj= 7 bi (0, 1, 0), a= er| 


— WR 
v3 Thw3ll -( 2’ ¥2 


(b) It follows from Formula 3 that 
u=(u, vy )v1 + (u, v2\w2 + (u, ¥3hv3 
We leave it for you to confirm that 
(u, v1} = (1, 2,4)- (0, 1,0) =2 


and hence that 


(1,2 4)=200, 1,074 ara 


Ataardiedecanad 


Orthogonal Projections 


Many applied problems are best solved by working with orthogonal or orthonormal basis vectors. Such bases are 
typically found by starting with some simple basis (say a standard basis) and then converting that basis into an 


orthogonal or orthonormal basis. To explain exactly how that is done will require some preliminary ideas about 
orthogonal projections. 


In Section 3.3 we proved a result called the Prohection Theorem (see Theorem 3.3.2) which dealt with the problem 
of decomposing a vector u in 2” into a sum of two terms, Wj and W3, in which W] is the orthogonal projection of u 


on some nonzero vector a and W3 is orthogonal to W1 (Figure 3.3.2). That result is a special case of the following 
more general theorem. 


THEOREM 6.3.3 Projection Theorem 


If Wis a finite-dimensional subspace of an inner product space V,then every vector u in V can be expressed 
in exactly oneway as 


u=w, + Ww (7) 


where W is in W and W3 is in Jy. 


The vectors W1 and W32 in Formula 7 are commonly denoted by 

Wy =projgyu and w2=projgiu (8) 
They are called the orthogonal projection of u on W and the orthogonal projection of u on } +, respectively. The 
vector #2 is also called the component of u orthogonal to W. Using the notation in 8, Formula 7 can be expressed 
as 


U= projy U+ projyrt u (9) 


(Figure 6.3.1). Moreover, since projg,.u = u — projy u, we can also express Formula 9 as 


U = projg u+ Cu — projg u) (10) 
Ww 
u ‘ 
Proj U 
0 proj yu Ww 
Figure 6.3.1 


The following theorem provides formulas for calculating orthogonal projections. 


THEOREM 6.3.4 


Let W be a finite-dimensional subspace of an inner product space V. 


(a) If {w1, ¥3, ..., ¥y} is an orthogonal basis for W, and u is any vector in V, then 


G2 OR | a) 


projy u= > 3 , (11) 
vill lvall ivy 
(b) If {¥1, ¥3, ..., ¥y} is an orthonormal basis for W, and uw is any vector in V, then 
projy u= (u, vj }vy + (u, vg}v2 + + + + + (U, vy hvy (12) 


Proof (a) It follows from Theorem 6.3.3 that the vector u can be expressed in the form u = wy ++ W2, where 
Wy) = projg u is in W and W2 is in +; and it follows from Theorem 6.3.2 that the component projyr u = wy can be 
expressed in terms of the basis vectors for W as 


: W1.¥ Wi. ¥ WwW. ¥ 
projg u= wy = La ¥1 + tly) 7 AME 


2 2 3 (13) 
II¥i ll IIvall Il¥y | 


Since 3 is orthogonal to W, it follows that 
(w2, V1} = (W2, V2} =...= (wo, vy} = 0 
so we can rewrite 13 as 


. {W1 - WwW, ¥1} (Wi -+ W2, ¥3} {W1 “+ W3, vy} 
projyg7 U = Wy = Vict Woe: 4 vy 


a : 
2 2 2 
II¥1 ll II¥all II¥rll 


or, equivalently, as 


1 2 Vy 
2 2 2 
Il¥1 Il Ilvall Il¥y| 


Proof (a) In this case, ||¥w |] = ||w3|] =... = ||¥,|] = 1, so Formula 13 simplifies to Formula 12. 


EXAMPLE 6 Calculating Projections <@ 


Let p7 have the Euclidean inner product, and let W be the subspace spanned by the orthonormal 
vectors vj = (0, 1, 0) and v2 = (- Z, 0, 3} From Formula 12 the orthogonal projection of 
u= (1, 1, 1) on Wis 

projgu =(u, vj }vj + (u, v2\v2 


= (1)(0, 1, 0) + Galas °, 5) 


(414-2 
25°" 25 


The component of u orthogonal to W is 


projgrt U=u— projyu= (1, 1,1) — es i; -35)= Ge 0, 3 | 


Observe that projgr1 wis orthogonal to both v1 and v3, so this vector is orthogonal to each vector in 
the space W spanned by ¥ and V3, as it should be. 


A Geometric Interpretation of Orthogonal Projections 


If W is a one-dimensional subspace of an inner product space V, say span {a} , then Formula 11 has only the one 
term 


{u, a} 
a 
2 
I|al| 


In the special case where V is 2? with the Euclidean inner product, this is exactly Formula 10 of Section 3.3 for the 


projg u= 


orthogonal projection of u along a. This suggests that we can think of 11 as the sum of orthogonal projections on 
“axes” determined by the basis vectors for the subspace W (Figure 6.3.2). 


x I 
———- projywu 


Figure 6.3.2 


The Gram—Schmidt Process 


We have seen that orthonormal bases exhibit a variety of useful properties. Our next theorem, which is the main 
result in this section, shows that every nonzero finite-dimensional vector space has an orthonormal basis. The proof 
of this result is extremely important, since it provides an algorithm, or method, for converting an arbitrary basis into 
an orthonormal basis. 


THEOREM 6.3.5 


Every nonzero finite-dimensional inner product space has an orthonormal basis. 


Proof Let Wbe any nonzero finite-dimensional subspace of an inner product space, and suppose that 

{uj, U3, ..., Uy} is any basis for W. It suffices to show that W has an orthogonal basis, since the vectors in that basis 
can be normalized to obtain an orthonormal basis. The following sequence of steps will produce an orthogonal basis 
{¥1, V2, -.., Vy} for W: 


Step 1. Let ¥] =uy. 


Step 2. As illustrated in Figure 6.3.3, we can obtain a vector ¥3 that is orthogonal to ¥1 by computing the 
component of Wz that is orthogonal to the space }#”; spanned by ¥1. Using Formula 11 to perform this 
computation we obtain 


(uz, V1} ™ 
2 
II¥1 ll 


Of course, if yz = 0, then V2 is not a basis vector. But this cannot happen, since it would then follow from 
the above formula for ¥2 that 


V2 = U2 — projy, uz =u? — 


2 2 
II¥1 ll I[u1 I 
which implies that U3 is a multiple of Uj, contradicting the linear independence of the basis 
S= {Uy, Uz, -... Uy} - 


— {82-71} _ (U2 ¥i} 


V5 =U, —projy, U, 
2 2 2 


vy PrOjy, U2 


Figure 6.3.3 


Step 3. To construct a vector V3 that is orthogonal to both ¥1 and ¥2, we compute the component of U3 orthogonal 
to the space }#3 spanned by V1 and ¥2 (Figure 6.3.4). Using Formula 11 to perform this computation we 
obtain 

uz, V uz, V 

{usvi},,  fusva) 

IIvi ll IIvall 


As in Step 2, the linear independence of {uj, u3, ..., U,} ensures that yz #0. We leave the details for you. 


V3 = U3 — projy, U3 = U3 — 


V, =U, — proj U, 


Projyu 3 
Figure 6.3.4 


Step 4. To determine a vector ¥4 that is orthogonal to ¥1, ¥2, and ¥3, we compute the component of u4 orthogonal 
to the space }#’3 spanned by V1, ¥2, and v3. From 11, 


u4, Vj u4, V2 U4, V3 
V4 = U4 — projy, Ug = Ug — {MoV = {M4 V2) vy - {4 V3) ys 

II¥ill II¥all II¥sll 
Continuing in this way we will produce an orthogonal set of vectors {v1, ¥32, ..., Vy} after r steps. Since orthogonal 
sets are linearly independent, this set will be an orthogonal basis for the r-dimensional space W. By normalizing 
these basis vectors we can obtain an orthonormal basis. 


The step-by-step construction of an orthogonal (or orthonormal) basis given in the foregoing proof is called the 
Gram—Schmidt process. For reference, we provide the following summary of the steps. 


The Gram-Schmidt Process 


To convert a basis {uj, U3, ..., Uy} into an orthogonal basis {vj, v3, .... vy}, perform the following 


computations: 
Step 1. %1 74 
Step 2. U2, ¥1 
P ¥2=u2- {2 V1) y, 
Ilva ll 
Step3. {u3, V1} {u3, V2} 
v¥3=u3- a i > ¥2 
IIvill II¥all 
Step 4. _ {u4, V1} {u4, V2} {u4, v3} 
V¥4=u4- 2 vi- 3 v2—- 7 V3 
IIvi Il IIvall IIvall 


(continue for r steps) 


Optional Step. To convert the orthogonal basis into an orthonormal basis {q1, q3, --., gy} , normalize the 
orthogonal basis vectors. 


EXAMPLE 7 Using the Gram—Schmidt Process 


Assume that the vector space 22 has the Euclidean inner product. Apply the Gram—Schmidt process 
to transform the basis vectors 
a =(1,1,1), w=(,1,1), ug=(0,0,1) 


into an orthogonal basis {¥v1, v2, V3} , and then normalize the orthogonal basis vectors to obtain an 
orthonormal basis {qj, 42, q3} - 


Solution 

Step 1. V1 = U1 = (11,1) 

Step 2. . uz, ¥1 
Pw. = we projy, uz =u2— AT, 


2 
Ilva ll 


2 a ee ee 
@ 10-20 10=( Ba) 


I 


Step 3. . u3, V1 U3, ¥2 
V3 = ug — proj, w3 = ug — Ty, _ {83 ¥2) y V2 
lil? val? 

7 ~1a1-13/-2,11 

S COD=3G LNs leas) 

ae ee! 

= (0-2-3) 
Thus, 


form an orthogonal basis for 22. The norms of these vectors are 
y6 1 
lvill= 3, UIvall=4>, Ilvall = 
y2 


so an orthonormal basis for 2? is 


ie Se 


ile (ay 


Remark In the last example we normalized at the end to convert the orthogonal basis into an orthonormal basis. 
Alternatively, we could have normalized each orthogonal basis vector as soon as it was obtained, thereby producing 
an orthonormal basis step by step. However, that procedure generally has the disadvantage in hand calculation of 
producing more square roots to manipulate. A more useful variation is to “scale” the orthogonal basis vectors at 
each step to eliminate some of the fractions. For example, after Step 2 above, we could have multiplied by 3 to 
produce ( — 2, 1, 1) as the second orthogonal basis vector, thereby simplifying the calculations in Step 3. 


Erhardt Schmidt (1875-1959) 


Historical Note Schmidt wasa German mathematician who studied for his doctoral degree at Gottingen 
University under David Hilbert, one of the giants of modern mathematics. For most of his life he taught at 
Berlin University where, in addition to making important contributions to many branches of mathematics, 
he fashioned some of Hilbert's ideas into a general concept, called a Hilbert space—a fundamental idea in 


the study of infinite-dimensional vector spaces.He first described the process that bears his name in a paper 
on integral equations that he published in 1907. 
[Image: Archives of the Mathematisches Forschungsinst] 


Jorgen Pederson Germ (1850-1916) 


Historical Note Gram was a Danish actuary whose early education was at village schools 
supplementedby private tutoring. He obtained a doctorate degree in mathematics while working for the 
Hafnia Life Insurance Company, where he specialized in the mathematics of accident insurance.It was in his 
dissertation that his contributions to the Gram—Schmidt process were formulated. He eventually became 
interested in abstract mathematics and received a gold medal from the Royal Danish Society of Sciences 
and Letters in recognition of his work. His lifelong interest in applied mathematics never wavered, however, 
and he produced a variety of treatises on Danish forest management. 

[Image: wikipedia] 


CALCULUS REQUIRED 
EXAMPLE 8 Legendre Polynomials << 


Let the vector space 3 have the inner product 


1 
= / _P(a)a(a) dx 


P.q 


Apply the Gram—Schmidt process to transform the standard basis { Laat ‘ for Pz into an 


orthogonal basis {@; (x), 63(x), 63(x)}. 


Solution Take uy = 1,02 =%, and y, = x2. 
Step 1. Vi =u =1 
Step 2. We have 


sO 


{U2 Vi} vj=u 


¥2=u2- 
2 
Ilva ll 


Step 3. We have 


sO 
U3, ¥1 U3, ¥2 2 
v3=u;—- "UY, — {83-2 =x 
Ilva | II¥all 
Thus, we have obtained the orthogonal basis {@; (x), 62(x), @3(x)} in which 


e(x)=1, dolx)=x, O3(x) =x? -5 


ae 
3 


Remark The orthogonal basis vectors in the foregoing example are often scaled so all three functions have a value 
of 1 at x = 1. The resulting polynomials 


1, x, 5(3x?-1) 


which are known as the first three Legendre polynomials, play an important role in a variety of applications. The 
scaling does not affect the orthogonality. 


Extending Orthonormal Sets to Orthonormal Bases 


Recall from part (b) of Theorem 4.5.5 that a linearly independent set in a finite-dimensional vector space can be 
enlarged to a basis by adding appropriate vectors. The following theorem is an analog of that result for orthogonal 
and orthonormal sets in finite-dimensional inner product spaces. 


THEOREM 6.3.6 


If W is a finite-dimensional inner product space, then: 
(a) Every orthogonal set of nonzero vectors in W can be enlarged to an orthogonal basis for W. 


(b) Every orthonormal set in W can be enlarged to an orthonormal basis for W. 


We will prove part (b) and leave part (a) as an exercise. 


Proof (b) Suppose that S= {w,, v9, .... ¥;} is an orthonormal set of vectors in W. Part (b) of Theorem 4.5.5 tells 
us that we can enlarge S to some basis 


" 
S — {¥1, V2, ---+ V5, Vs+1; 3 Vic} 
for W. If we now apply the Gram—Schmidt process to the set s', then the vectors v1, v3, ..., ¥;, Will not be affected 
since they are already orthonormal, and the resulting set 
" 
S = {¥i, V2, ---+ V5, Vs+1; lt J Vic} 


will be an orthonormal basis for W. 


OPTIONAL 
QR-Decomposition 


In recent years a numerical algorithm based on the Gram—Schmidt process, and known as QR-decomposition, has 
assumed growing importance as the mathematical foundation for a wide variety of numerical algorithms, including 
those for computing eigenvalues of large matrices. The technical aspects of such algorithms are discussed in 
textbooks that specialize in the numerical aspects of linear algebra. However, we will discuss some of the 
underlying ideas here. We begin by posing the following problem. 


Problem 


If A is an j2 % » matrix with linearly independent column vectors, and if QO is the matrix that results by 
applying the Gram—Schmidt process to the column vectors of A, what relationship, if any, exists between A 
and O? 


To solve this problem, suppose that the column vectors of A are uj, U3, -.., U;, and the orthonormal column vectors 
of O are q1, q2, ---, Yy. Thus, A and Q can be written in partitioned form as 


A= [uj|ug|...\u,] and Q= [qi |qa]--- |dn] 


It follows from Theorem 6.3.26 that uj, u3, ..., Uy, are expressible in terms of the vectors qj, q3, -.., Gy aS 


uw = (U1,dija1 + (U1,g2}q2 +e ss + (U1, dn}dn 
ug = {uz,qi}q1 + (uz,q2}q2 +es + + (U2, dn}dn 
WU, = (Uy, Gi}d1 + (Un. d2}q2 +e +b (Un, n}dn 


Recalling from Section 1.3 (Example 9) that the jth column vector of amatrix product is a linear combination of the 
column vectors of the first factor with coefficients coming from the jth column of the second factor, it follows that 
these relationships can be expressed in matrix form as 


{41,41} (42,41) --- (Un d1} 


[myo] fe] = Lafaa].-- fan] (Ee 922 (82 42) (Soe 2) 


{U1, dn} (ua, an} +o (tp, dn) 


or more briefly as 
A=OR (14) 
where R is the second factor in the product. However, it is a property of the Gram—Schmidt process that for j > 2, 


the vector 4j is orthogonal to uj, U3, -.., uj-t. Thus, all entries below the main diagonal of R are zero, and R has the 
form 


(Wi,d1} (U2,d1} *** (Un, a1} 
R= ; mr a me (15) 
0 0 eee (Uy, Gn} 


We leave it for you to show that R is invertible by showing that its diagonal entries are nonzero. Thus, Equation 14 
is a factorization of A into the product of a matrix O with orthonormal column vectors and an invertible upper 
triangular matrix R. We call Equation 14 the QR-decomposition of A. In summary, we have the following theorem. 


THEOREM 6.3.7 QR-Decomposition 


If A is an j s¢ 2 matrix with linearly independent column vectors, then A can be factored as 
A=@QR 


where Q is an j92 % % matrix with orthonormal column vectors, and R is an » x 4 invertible upper triangular 
matrix. 


It is common in numerical linear algebra to say 
that a matrix with linearly independent columns 
has full column rank. 


Recall from Theorem 5.1.6 (the Equivalence Theorem) that a square matrix has linearly independent column 
vectors if and only if it is invertible. Thus, it follows from the foregoing theorem that every invertible matrix has a 
OR-decomposition. 


EXAMPLE 9 QR-Decomposition of a3 x 3 Matrix << 


Find the OR-decomposition of 


100 
A=/1 1 0 
111 
Solution The column vectors of A are 
1 0 0 
uj=|/1}, w=]1 u3 =| 0 
1 1 1 


Applying the Gram—Schmidt process with normalization to these column vectors yields the 


orthonormal vectors (see Example 7) 


qi = 


SF Tae Se 


ae 
q2= 6 

za 

V6 
Thus, it follows from Formula 15 that R is 


{U1,q1} (U2.q1} (us, a1} 


R=| 9 = {uz,q2} (u3,42}|= 


0 0 {u3, q3} 


Show that the matrix O in Example 9 has 
the property 0G _ /, and show that every 


x » Matrix with orthonormal column 
vectors has this property. 


from which it follows that the @X-decomposition of A is 


1 12 4 
(3 6 
a 
So ee: ae 
(3 yo 2 
A = Q 


Concept Review 


Orthogonal and orthonormal sets 
Normalizing a vector 
Orthogonal projections 
Gram—Schmidt process 


OR-decomposition 


Skills 


wk Sak Oe 


ik Sak Se 


e Determine whether a set of vectors is orthogonal (or orthonormal). 
e Compute the coordinates of a vector with respect to an orthogonal (or orthonormal) basis. 
e Find the orthogonal projection of a vector onto a subspace. 


e Use the Gram—Schmidt process to construct an orthogonal (or orthonormal) basis for an inner product 
space. 


e Find the OR-decomposition of an invertible matrix. 


Exercise Set 6.3 


1. Which of the following sets of vectors are orthogonal with respect to the Euclidean inner product on 22? 


(a) (0, 1), (2,0) 

fa 12) fa 
y2° Y2)" \y2" ¥2 

(©) [se _ 

(a) (0, 0), (0, 1) 


Answer: 


(a), (b), (d) 


2. Which of the sets in Exercise 1 are orthonormal with respect to the Euclidean inner product on p2? 


3. Which of the following sets of vectors are orthogonal with respect to the Euclidean inner product on R77 
Ul he Geode) ee ee eed oe 
fa fap ys v3 v3) 2 
(2. a2 DV f21 22) i122 
a 6S SY NS 38 SENS SS 


(c) 1 1 
1, 0, 0), |9, , , (0,0, 1 
(1, > | is) ( ) 


ee te 2 ly 
Yo yo yoy yy2” ¥2 

Answer: 

(b), (d) 


4, Which of the sets in Exercise 3 are orthonormal with respect to the Euclidean inner product on 27? 


5. Which of the following sets of polynomials are orthonormal with respect to the inner product on 3 discussed in 
Example 7 of Section 6.1 ? 


a 2 204 19 ee eee ai42% ee » 2,2 
(a) py (x) 3 gt + gt", P2x) raat Bt > P3(x) gt get ox 


©) pia) = 1, pale) = Fox + peat pala) =x 


Answer: 


(a) 
. Which of the following sets of matrices are orthonormal with respect to the inner product on Af discussed in 
Example 6 of Section 6.1 ? 


(a) 2 2 1 
1 0 ? 3 : 3 : 3 
OO [tec Moa ae ee 

3 3 > 5 3 3 


sealer are 


. Verify that the given vectors form an orthogonal set with respect to the Euclidean inner product; then convert it 
to an orthonormal set by normalizing the vectors. 


(a) (—1, 2), (6, 3) 
(b) (1,0, — 1), (2, 0, 2), (0, 5, 0) 


© 6.55} (- 9 ooh (FF - 5] 


1 o,-} ¢0,1,0 
5} is 70) ca 


(c) ai ls ode i pe SS 
reareR el y2'y2 f lye ye v6 

. Verify that the set of vectors {(1, 0}, (0, 1}} is orthogonal with respect to the inner product 

{u, vi = 4ujv1 + %2V2 on R2; then convert it to an orthonormal set by normalizing the vectors. 
. Verify that the vectors 

ey eee! —{4 3 = 
n= | 35-0}, v2= (5. 2,0}, ¥3= (0,0, 1) 

form an orthonormal basis for 27 with the Euclidean inner product; then use Theorem 6.3.25 to express each of 

the following as linear combinations of ¥1, ¥2, and V3. 

(a) (1, = 1, 2) 

(b) (3, — 7.4) 


© 4.) 


Answer: 


(a) = + rae ++ 2v3 


(b) 22 y) — 2v2 + 4v3 


(ices Ges a 
a a¥2 | 4¥3 


10. Verify that the vectors 
vj =(1, =—1,2, = 1), v2=(—2, 2, 3, 2), 
v3 = (1, 2,0, =—1), v4= (1, 0,0, 1) 
form an orthogonal basis for 24 with the Euclidean inner product; then use Theorem 6.3.2a to express each of 


the following as linear combinations of v1, ¥2, v3, and ¥4. 


ey 1, 1) 


(b) (v2. — 372, 572, - 2] 


Gifat2.218 
ce ane 


il. (a) Show that the vectors 


vy, = (1, -—2,3,-4), wo = (2,1, -—4, —3), 
v3 = (-3,4,1,-2), vg = (4,3,2,1) 
form an orthogonal basis for 24 with the Euclidean inner product. 


(b) Use Theorem 6.3.2a to express u= ( = 1, 2, 3, 7) as a linear combination of the vectors in part (a). 


Answer: 


(b) u= — iy - so72- + Ow + 3¥4 


In Exercises 12—13, an orthonormal basis with respect to the Euclidean inner product is given. Use Theorem 6.3.25 
to find the coordinate vector of w with respect to that basis. 


12. 
© w= (,7);11 = Lt. - +5} w= [F 4] 


(b) w=(-1,0,2);m1=(3, -. 3}a 2= (3.3 
13. (a) 212 me 2 ee = eee eee 
cer ha a & i z)ur= (3.5. z\us= (5. 3] 
aS 


3 

(b) 3 1 1 1 2 

wah Lave (Bee ae w= |-—, - = 
aye} | 


(6) w= Lu, + tha, 


6 66 


In Exercises 14-15, the given vectors are orthogonal with respect to the Euclidean inner product. Find projgr x, 
where x = (1, 2, 0, —2) and Wis the subspace of 24 spanned by the vectors. 


14. (a) vy =(1, 1,1, 1), v2 =(1, 1, —1, —1) 
(b) vy = (0, 1, —4, — 1), v2 = (3, 5, 1, 1) 


15. (a) vy = (1,1,1, 1), v9 = (1,1, <1, —D,v3= (1, —1,1, —1) 
(b) v1 = (0, 1, —4, —1), v9 = (3, 5, 1, 1), v3 = (1, 0, 1, —4) 


Answer: 


64 -$-9 


(b) fa: 7 1 oa 


iat io" 42 


In Exercises 16-17, the given vectors are orthonormal with respect to the Euclidean inner product. Use Theorem 
6.3.4b to find projgr x, where x = (1, 2,0, — 1) and Wis the subspace of R*4 spanned by the vectors. 


18. In Example 6 of Section 4.9 we found the orthogonal projection of the vector x = (1, 5) onto the line through 
the origin making an angle of x / 6 radians with the x-axis. Solve that same problem using Theorem 6.3.4. 


19. Find the vectors W1 in W and W3 in J+ such that x = wy) + wz, where x and Ware as given in 
(a) Exercise 14(a). 
(b) Exercise 15(a). 


Answer: 


Ome “ie miele a 

Wy ce 1 1|, we 5 al, 1 

(b) « —/2 2 3 3 afoi2.3 3 3 
Wy] a 4? 4 > W2 4°4°4 4 


20. Find the vectors W1 in W and 3 in }7”+ such that x = wy ++ W3, where x and W are as given in 
(a) Exercise 16(a). 
(b) Exercise 17(a). 
21. Let 7 have the Euclidean inner product. Use the Gram—Schmidt process to transform the basis {uj, uz} into 
an orthonormal basis. Draw both sets of basis vectors in the xy-plane. 
(a) w= (1, = 3), u2= (2, 2) 
(b) w= (1,9), w= G, = 5) 


Answer: 


— 
Oo 


ted) i 


(b) 11 = (1,0), v2=(0, -1) 


22. Let 3 have the Euclidean inner product. Use theGram—Schmidt process to transform the basis {uj, u2, u3} 
into an orthonormal basis. 


(a) uy = (1, 1, 1), ug = (=—1, 1, 9),u3= (1, 2, 1) 
(b) uy = (1, 0, 0), ug = (3, 7, — 2), ug = (0, 4, 1) 


23. Let 24 have the Euclidean inner product. Use the Gram—Schmidt process to transform the basis 
{uyj, U2, U3, U4} into an orthonormal basis. 


24, 


25. 


26. 


27. 


28. 


29. 


uj =(0,2,1,0), u2=(1, —1,0,0), 
uz= (1,2,0, —1), ug=(1,0,0, 1) 


Answer: 


Let 27 have the Euclidean inner product. Find an orthonormal basis for the subspace spanned by (0, 1, 2), 
(=1,0, 1), (=1, 1, 3). 
Let 2? have the inner product 
fu, v} =u 11 + 2422 + 3u3V3 
Use the Gram—Schmidt process to transform uy = (1, 1, 1), ug = (1, 1, 0), uz = (1, 0, 0) into an orthonormal 


basis. 


Answer: 


| ge a ee oe leg 

yo yo y6 yoyo 6 yo 6 
Let R? have the Euclidean inner product. The subspace of 7 spanned by the vectors uy = G = -) and 
uz = (0, 1, 0) is aplane passing through the origin. Express w= (1, 2, 3) in the form w= wy, ++ W2, where Wj 
lies in the plane and W3 is perpendicular to the plane. 


Repeat Exercise 26 with uj = (1, 1, 1) andu3z = (2,0, = 1). 
Answer: 


AS STAD of, Re 3 
w= (ip 14° ra w= (Ty 14° a) 


Let 24 have the Euclidean inner product. Express the vector w= ( — 1, 2, 6, 0) in the form w= wy + wo, 
where W is in the space W spanned by uj = ( = 1, 0, 1, 2) and uz = (0, 1, 0, 1), and 2 is orthogonal to W. 


Find the O&-decomposition of the matrix, where possible. 


(a) |1 —1 
3 


TA eee 


Answer: 


is ale {2 
a |S {Soft She ° 


ee = ie —) i) 
—— lari. Let ° o es — = — —. 2 
ee EE RR A ae 4p 


So c He eg TS 


i _ 


AAS 0 a 4a qe 484848 42 -424e 8 of ae 


4oate 42 ce 4g cm iam dee de 42 de o 


eee ee eee ee 


(f) Columns not linearly independent 


30. 


31. 
32. 


33. 


34. 


In Step 3 of the proof of Theorem 6.3.5, it was stated that “the linear independence of {uj, uz, ..., U,} ensures 
that vz #0.” Prove this statement. 


Prove that the diagonal entries of R in Formula 15 are nonzero. 


Calculus required Use Theorem 6.3.2a to express the following polynomials as linear combinations of the first 
three Legendre polynomials (see the Remark following Example 8). 


(a) 14x+4+4x? 
(b) 2 — 7x? 
(c) 4+ 3x 


Calculus required Let P have the inner product 


1 
p.a\= 7 p(x)q(x) dx 


Apply the Gram—Schmidt process to transform the standard basis 8 = f1, x, X% 7 into an orthonormal basis. 


Answer: 


vj =1, vg=f3(2x—1), v3 = y'5(6x7 = 6x + 1) 
Find vectors x and y in p2 that are orthonormal with respect to the inner product {u,v} = 3u1V1 + 2u3V2 but 


are not orthonormal with respect to the Euclidean inner product. 


True-False Exercises 


In parts (a)—(f) determine whether the statement is true or false, and justify your answer. 


(a) Every linearly independent set of vectors in an inner product space is orthogonal. 


Answer: 


False 


(b) Every orthogonal set of vectors in an inner product space is linearly independent. 


Answer: 


False 


(c) Every nontrivial subspace of 27 has an orthonormal basis with respect to the Euclidean inner product. 


Answer: 


True 


(d) Every nonzero finite-dimensional inner product space has an orthonormal basis. 


Answer: 


True 


(e) proj x is orthogonal to every vector of W. 


Answer: 


False 


(f) If A is an » sx », matrix with a nonzero determinant, then 4 has a OR-decomposition. 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


6.4 Best Approximation; Least Squares 


In this section we will be concerned with linear systems that cannot be solved exactly and for which an approximate solution is 
needed. Such systems commonly occur in applications where measurement errors “perturb” the coefficients of a consistent system 
sufficiently to produce inconsistency. 


Least Squares Solutions of Linear Systems 


Suppose that 4x = h is an inconsistent linear system of m equations in n unknowns in which we suspect the inconsistency to be 
caused by measurement errors in the coefficients of A. Since no exact solution is possible, we will look for a vector x that comes as 
“close as possible” to being a solution in the sense that it minimizes ||b — Ax|| with respect to the Euclidean inner product on R™. 
You can think of 4x as an approximation to b and ||b — Ax|| as the error in that approximation—the smaller the error, the better 
the approximation. This leads to the following problem. 


Least Squares Problem 


Given a linear system 4x = h of m equations in n unknowns, find a vector x that minimizes ||b — Ax|| with respect to the 
Euclidean inner product on 2™. We call such an x a least squares solution of the system, we call h — Ax the least squares 
error vector, and we call ||b — Ax|| the least squares error. 


To clarify the above terminology, suppose that the matrix form of h — Ax is 


#1 
e 
b—Ax=|°? 
?m 
The term “least squares solution” results from the fact that minimizing ||b — Ax|| also minimizes ||b — Ax||? — ef | 2 -...4 ee, 


Best Approximation 


Suppose that b is a fixed vector in 2? that we would like to approximate by a vector w that is required to lie in some subspace W 
of R?. Unless b happens to be in W, then any such approximation will result in an “error vector” h — yw that cannot be made equal 
to 0 no matter how w is chosen (Figure 6.4.1a). However, by choosing 
w= projy b 
we can make the length of the error vector 
[|b — w|| = |[b — projyy bl 
as small as possible (Figure 6.4.15). 
p 


b b — projy,b 


Proj gb Q 


(a) (b) 
Figure 6.4.1 


These geometric ideas suggest the following general theorem. 


THEOREM 6.4.1 Best Approximation Theorem 


If W is a finite-dimensional subspace of an inner product space V, and if b is a vector in V, then projyr b is the best 
approximation to b from W in the sense that 
I|b — projyr bl] < [|b — wll 


for every vector w in W that is different from projgr b. 


Proof For every vector w in W, we can write 


b—w= (b —projyy b) + (projgr b —w) (1) 
But projyr b —w being a difference of vectors in Wis itself in W; and since h — projyr b is orthogonal to W, the two terms on the 
right side of 1 are orthogonal. Thus, it follows from the Theorem of Pythagoras (Theorem 6.2.3) that 
2 : 2 : 2 
[|b —wl|" = ||b — projyr bl|" + |lprojgr b — wl 
Since w # projygr b, it follows that the second term in this sum is positive, and hence that 
: 2 os 2 
\|b — projyr bl|” < ||b — wl 
Since norms are nonnegative, it follows (from a property of inequalities) that 
||b — proj b|| < [|b — wll 


Least Squares Solutions of Linear Systems 


One way to find a least squares solution of 4x = h is to calculate the orthogonal projection projjr b on the column space W of the 
matrix A and then solve the equation 


Ax = projyy b (2) 
However, we can avoid the need to calculate the projection by rewriting 2 as 


b — Ax =b — projy b 
and then multiplying both sides of this equation by 47 to obtain 


Al (bh — Ax) = A? (b — projyr b) (3) 
Since b — projgr b is the component of b that is orthogonal to the column space of A, it follows from Theorem 4.8.95 that this 
vector lies in the null space of ,47, and hence that 
Al (bh — proj b) =0 
Thus, 3 simplifies to 


A’ (h— Ax) =0 


which we can rewrite as 


Al Ax =A" (4) 


This is called the normal equation or the normal system associated with 4x — h. When viewed as a linear system, the individual 
equations are called the normal equations associated with Ax = hb. 


In summary, we have established the following result. 


THEOREM 6.4.2 
For every linear system 4x — h, the associated normal system 
Al Ax=A"h (5) 


is consistent, and all solutions of 5 are least squaressolutions of Ax — h. Moreover, if Wis the column space of A, and x is 
any least squares solution of 4x — h, then the orthogonal projection of b on W is 


projyr b = Ax (6) 


If a linear system is consistent, then its exact solutions are 
the same as its least squares solutions, in which case the 
error is zero. 


EXAMPLE 1 Least Squares Solution <@ 


(a) Find all least squares solutions of the linear system 


xy, = xz = 4 
3x4 - 2x27 = 1 
=2x4 - 4x27 = 3 


(b) Find the error vector and the error. 


Solution 


(a) It will be convenient to express the system in the matrix form Ay — h, where 


1-1 4 
A=| 3 2] and b=/1 
—2 4 3 
It follows that 
T 1 3: .2 ha 14 +3 
eel sal 
2 Alle, a —3 21 


so the normal system 47 4, — 47p is 


14 =3)}/*%1] |] 1 
—3 214[*2 10 
Solving this system yields a unique least squares solution, namely, 


ys _ 143 


“I= 957 42 995 


(b) The error vector is 


__92 1232 
V2) 8) 22 
ica bn Ue 343. ]7]2|7] 285 || “285 
=e. “| 285 95 4 
57 3 
and the error is 
||b — Ax|| = 4.556 


EXAMPLE 2 Orthogonal Projectionona Subspace <« 


Find the orthogonal projection of the vector u= ( — 3, — 3, 8, 9) on the subspace of 24 spanned by the vectors 
uy = (3,1,0,1), ug=C1,2,1,1), ug=(-1,0,2, —1) 


Solution We could solve this problem by first using the Gram—Schmidt process to convert {uj, uz, uz} into an 
orthonormal basis and then applying the method used in Example 6 of Section 6.3 . However, the following method 
is more efficient. 


The subspace W of R4 spanned by 44, U2, and U3 is the column space of the matrix 


a 
12 0 
aml 2 
i too) 


Thus, if u is expressed as a column vector, we can find the orthogonal projection of u on W by finding a least 
squares solution of the system 4x — y and then calculating projjgr u = Ax from the least squares solution. The 
computations are as follows: The system 4x — y is 


a es 3 
x1 

12 oO |i}-3 

01 2\/35 8 

1b Si 9 

so 

310 17/7 — 11 6 =4 
ATA =| 121 1/0 s|=| 67 0 
—102-1)), , 2) -40 6 


Alu 


Il 
ww WwW 
hm 
Ne © 
— a 
fc «- . wh — “a0 
I | 
a 
a | best 
Il 
I 
coo WwW 


11 6 =4/["%1 
6 7 Of) *2 


Solving this system yields 


as the least squares solution of 4x — y (verify), so 


projy u= Ax = 


or, in comma-delimited notation, projyr u= ( — 2, 3, 4, 0). 


Uniqueness of Least Squares Solutions 


In general, least squares solutions of linear systems are not unique. Although the linear system in Example | turned out to have a 
unique least squares solution, that occurred only because the coefficient matrix of the system happened to satisfy certain conditions 
that guarantee uniqueness. Our next theorem will show what those conditions are. 


THEOREM 6.4.3 


If A is an 3 x »% matrix, then the following are equivalent. 
(a) A has linearly independent column vectors. 


(b) A? Ais invertible. 


Proof We will prove that (@) =» (b) and leave the proof that (+) => (@) as an exercise. 


(a) =» (b) Assume that A has linearly independent column vectors. The matrix 47 4 has size » x #, 0 we can prove that this 
matrix is invertible by showing that the linear system 47 4, — Q has only the trivial solution. But if x is any solution of this 
system, then 4x is in the null space of ,47 and also in the column space of A. By Theorem 4.8.95 these spaces are orthogonal 
complements, so part (b) of Theorem 6.2.4 implies that 4, —Q. But A is assumed to have linearly independent column vectors, so 
x = 0 by Theorem 1.3.1. 


As an exercise, try using Formula 7 to solve the problem 
in part (a) of Example 1. 


The next theorem, which follows directly from Theorem 6.4.2 and Theorem 6.4.3, gives an explicit formula for the least squares 
solution of a linear system in which the coefficient matrix has linearly independent column vectors. 


THEOREM 6.4.4 


If A is an j2 x 2 matrix with linearly independent column vectors, then for every }; x | matrix b, the linearsystem 4x = b 
has a unique least squares solution. This solution is given by 


x= (47a) a (7) 


Moreover, if Wis the column space of A, then the orthogonalprojection of b on W is 


-l 
proj b= éx= A(A7 A) Ath (8) 


OPTIONAL 
The Role of QR-Decomposition in Least Squares Problems 


Formulas 7 and 8 have theoretical use but are not well suited for numerical computation. In practice, least squares solutions of 
Ax = bare typically found by using some variation of Gaussian elimination to solve the normal equations or by using 


OR-decomposition and the following theorem. 


THEOREM 6.4.5 


If A is an jz x »% matrix with linearly independent column vectors, and if A = OR is a OR-decomposition of A (see Theorem 
6.3.7), then for each b in R™ the system 4x — h has a unique least squares solution given by 


x=R 10" (9) 


A proof of this theorem and a discussion of its use can be found in many books on numerical methods of linear algebra. However, 
you can obtain Formula 9 by making the substitution A = Q& in 7 and using the fact that ofo = / to obtain 


x = (eR (ER) (OR) 


(e707 oR) *(0R)" 
_ po (a7) "RQ" 


Orthogonal Projections on Subspaces of R™ 


In Section 4.8 we showed how to compute orthogonal projections on the coordinate axes of a rectangular coordinate system in 7 
and more generally on lines through the origin of 23. We will now consider the problem of finding orthogonal projections on 


subspaces of R™. We begin with the following definition. 


DEFINITION 1 


If Wis a subspace of 2”, then the linear transformation P:R” —, }f that maps each vector x in R™ into its orthogonal 


projection projyr x in W is called the orthogonal projection of R™ on W 


It follows from Formula 7 that the standard matrix for the transformation P is 


-l 
[P] = AAT) at (10) 
where A is constructed using any basis for W as its column vectors. 


EXAMPLE 3 The Standard Matrix for an Orthogonal ProjectiononaLine << 


We showed in Formula 16 of Section 4.9 that 
2 ; 
Fy cos*@ sn@ oe 6 
sm@cos@ sin“ @ 


is the standard matrix for the orthogonal projection on the line W through the origin of 22 that makes an angle @ with 
the positive x-axis. Derive this result using Formula 10. 


Solution The column vectors of A can be formed from any basis for W. Since W is one-dimensional, we can take 
w= (cos @, sin @) as the basis vector (Figure 6.4.2), so 


cos 
A=]. 
sin # 
We leave it for you to show that 474 is the | x | identity matrix. Thus, Formula 10 simplifies to 


cos @ 


[P] = 4(474) a? = aa? =| |teoee sin 0] 
sin 


; 3 a 
sin cos @ sin” @ 


cos? sin 8 cos / =p 


cos 6 


Figure 6.4.2 


Another View of Least Squares 


Recall from Theorem 4.8.9 that the null space and row space of an j2 5 , matrix A are orthogonal complements, as are the null 
space of 47 and the column space of A. Thus, given a linear system 4x — in which A is an jy) % » matrix, the Projection 


Theorem (6.3.3) tells us that the vectors x and b can each be decomposed into sums of orthogonal terms as 
X=XrowAtXnuya and b= Prout a") + beaks) 


where Xtow(.4) and Xnull4) are the orthogonal projections of x on the row space of A and the null space of A, and the vectors 
P nat a7} and b ¢oi4) are the orthogonal projections of b on the null space of 4 T and the column space of A. 


In Figure 6.4.3 we have represented the fundamental spaces of A by perpendicular lines in 8” and 8” on which we indicated the 
orthogonal projections of x and b. (This, of course, is only pictorial since the fundamental spaces need not be one-dimensional.) 
The figure shows ,4x as a point in the column space of A and conveys that Boot ‘A is the point in col(4) that is closest to b. This 


illustrates that the least squares solutions of 4x — h are the exact solutions of the equation Ax = b col.A)- 


null(A) col(A) 


Nauka) Deo) 


null(A’) 


Dauttca r 


Figure 6.4.3 


More on the Equivalence Theorem 


As our final result in the main part of this section we will add one additional part to Theorem 5.1.6. 


THEOREM 6.4.6 Equivalent Statements 


If A is an » x » matrix, then the following statements are equivalent. 
(a) Ais invertible. 

(b) Ax =O has only the trivial solution. 

(c) The reduced row echelon form of A is /,,. 

(d) Ais expressible as a product of elementary matrices. 

(e) Ax =—h is consistent for every » 5 | matrix b. 

(f) Ax =hb has exactly one solution for every » 5 | matrix b. 
(g) det(A) #0. 

(h) The column vectors of A are linearly independent. 

(i) The row vectors of A are linearly independent. 

(j) The column vectors of A span 8”. 

(k) The row vectors of A span 2”. 

(1) The column vectors of A form a basis for 2”. 

(m) The row vectors of A form a basis for 2”. 

(n) A has rank». 

(o) Ahas nullity 0. 

(p) The orthogonal complement of the null space of A is 2”. 
(q) The orthogonal complement of the row space of A is {0}. 
(r) The range of T gis R”. 

(s) TF 41s one-to-one. 

(t) \=Ois not an eigenvalue of A. 


(u) A? Ais invertible. 


The proof of part (w) follows from part (/) of this theorem and Theorem 6.4.3 applied to square matrices. 


OPTIONAL 


We now have all the ingredients needed to prove Theorem 6.3.3 in the special case where V is the vector space 2”. 


Proof of Theorem 6.3.3 We will leave the case where }f” = {0} as an exercise, so assume that ” + {0}. Let 
{¥1, V2, .... Vi} be any basis for W, and form the jy; x & matrix M that has these basis vectors as successive columns. This makes 
W the column space of M and hence }f’+ the null space of yg 7. We will complete the proof by showing that every vector u in 2” 


can be written in exactly one way as 


u=wW, + W2 
where W is in the column space of M and ag‘ y. — 9. However, to say that Wy is in the column space of M is equivalent to saying 
w , = fx for some vector x in R™, and to say that Ml wy — Q is equivalent to saying that yf wen —wy,) =0. Thus, if we can 
show that the equation 


Ml (u— Mx) =0 (11) 


has a unique solution for x, then wy = Ax and Wz = X — W, will be uniquely determined vectors with the required properties. To 
do this, let us rewrite 11 as 


M'Mx=M'u 
Since the matrix M has linearly independent column vectors, the matrix ag? jg is invertible by Theorem 6.4.6 and hence the 
equation has a unique solution as required to complete the proof. 


Concept Review 

e Least squares problem 

e Least squares solution 

e Least squares error vector 
e Least squares error 

¢ Best approximation 

e Normal equation 


¢ Orthogonal projection 
Skills 


e Find the least squares solution of a linear system. 
e Find the error and error vector associated with a least squares solution to a linear system. 
e Use the techniques developed in this section to compute orthogonal projections. 


e Find the standard matrix of an orthogonal projection. 


Exercise Set 6.4 


1. Find the normal system associated with the given linear system. 


(a) |1 -1 x1 
2 3i[i3]- -1 


(b) 2—1 0 x1 —1 
3 12 : 0 
2 — 
-1 4 5 x3 1 
24 2 
Answer: 


(a) [21 25][*1]_ [20 
25 35||%2|~ | 20 
(bo) [15 -1 5 A -1 


—1 22 30)/*2}/=] 9 
5 30 45 || 43 13 


In Exercises 2-4, find the least squares solution of the linear equation 4x —h. 


2. (a) 1 =1 
=|2 3);b=]-1 
4 5 
(b) 2-2 
A=/1 1|;b=] -1 
3 


al 
FBS, 
2 
ww 
ths 
II 
| rr es a rae | 
a 
—> 
MR Re 
i— 
II 
I 
ao ~) 
ww o jn ee OS FR 


(b) 10-1 
21 <2 
“ 11 0 
11-1 
Answer: 
(a) x1 =5, m= 
(b) *1= 12, x2= —3, x3=9 
4. (a) 3 2 =1 2 
=/1—-4 3),;b=|-—2 
1 #10 +? 1 
(b) 2 0 =1 0 
1-2 2 6 
_— b= 
2 2-1 0 0 
0 1 =1 6 


In Exercises 5—6, find the least squares error vector e = h — Ax resulting from the least squares solution x and verify that it is 
orthogonal to the column space of A. 


5. (a) A and b are as in Exercise 3(a). 
(b) A and b are as in Exercise 3(b). 


Answer: 


fs) 
ll 
Bolo polos 


I 
WoWW Ww 


tas] 
II 


6. (a) A and b are as in Exercise 4(a). 
(b) A and b are as in Exercise 4(b). 


7. Find all least squares solutions of 4x — h andconfirm that all of the solutions have the same error vector. Compute the least 
squares error. 


(a) 21 3 
A=| 4 2|/;b=]2 
—2 1 1 
(b) fi + 3 1 
A=|-2 -—6|/;b=]0 
3. 9 1 
(c) -1 3 2 7 
A=| 2 1 3];b=]| 0 
011 —7 


Answer: 


(a) Solution: x = a 6 : least squares error: 4s 
10’ 5 5 
(6) solution: x = (5. 0] ++ £(—3, 1} (ta real number); least squares error: a¥42 
(C) Solution: x = [-¢ z 0] --£(—1, —1, 1) (¢a real number); least squares error: 5 294 


8. Find the orthogonal projection of u on the subspace of 27 spanned by the vectors v1 and V3. 
(a) u=(2,1,3); vwy=C1,1,9), we=(1, 2,1) 
(b) u= (1, — 6,1); w= (-1,2,1), vo=(, 2,4) 


9, Find the orthogonal projection of u on the subspace of 74 spanned by the vectors v1, ¥2, and V3. 


(a) u= (6, 3, 9, 6); v1 = (2, 1, 1, 1), v2 = (1, 0, 1, 1), v3 =(—2, — 1,0, -—1) 
(b) u=(—2, 0, 2, 4); v1 = (1, 1, 3, 0), va =(—2, —1, —2, 1), v3=(—3, —1, 1,3) 


Answer: 

(a) (7, 2,9, 5) 

(b) f_12 _4 12 16 
= a ig. 35 


10. Find the orthogonal projection of u= (5, 6, 7, 2) on the solution space of the homogeneous linear system 
Xp+ xXQ+ X3 =0 
2x2 +2%3+%4=0 


11. tn each part, find det (4 A), and apply Theorem 6.4.3 to determine whether A has linearly independent column vectors. 


12. 


13. 


1 


_ 


15. 


(a) -1 32 
A= 2:13 
011 
(b) 2 —1 3 
0 1 1 
A= 
-1 0 —2 
4 —5 3 
Answer: 


(a) det (A TA) = 0); A does not have linearly independent column vectors. 


(b) det (A PA) =0; A does not have linearly independent column vectors. 


Use Formula 10 and the method of Example 3 to find the standard matrix for the orthogonal projection P- R2 _, R2 onto 
(a) the x-axis. 

(b) the y-axis. 

[Note: Compare your results to Table 3 of Section 4.9.] 

Use Formula 10 and the method of Example 3 to find the standard matrix for the orthogonal projection P:R? _, 27 onto 
(a) the xz-plane. 

(b) the yz-plane. 

[Note: Compare your results to Table 4 of Section 4.9.] 


Answer: 
(a) 100 
[P]=]0 0 0 
00 1 
(b) 00 0 
[P]=]0 1 0 
00 1 
. Show that if w= (a, 4, c) is a nonzero vector, then the standard matrix for the orthogonal projection of 2? on the line 
span {w} is 
a* ab ac 
P= sab 0? be 
a? +b? +47 3 
ac be oc 


Let W be the plane with equation 5x — 3y +-z=0. 

(a) Find a basis for W. 

(b) Use Formula 10 to find the standard matrix for the orthogonal projection on W. 

(c) Use the matrix obtained in part (b) to find the orthogonal projection of a point Pa (xg, yp, zg) On W. 

(d) Find the distance between the point Pg{1, — 2, 4) and the plane W, and check your result using Theorem 3.3.4. 


Answer: 


(a) (1,0, —5), (0, 1,3) 
(b) 10° 15 -=5 
[Pl =a 15 26 3 
oa a 


16. 


17. 


18. 


19, 


20. 


21. 


22. 


(c) (= b3yg—2Z9 15x94 26yg +329 «=—5x9 + 3y¥0 + 34z 
a a. 


35 35 
(d) 3y 35 
7 


Let W be the line with parametric equations 
x= 2, =f, z=4 
(a) Find a basis for W. 
(b) Use Formula 10 to find the standard matrix for the orthogonal projection on W. 
(c) Use the matrix obtained in part (b) to find the orthogonalprojection of a point Py (xg, yg, zp) on W. 
(d) Find the distance between the point Pg (2, 1, — 3) and the line W. 


In R3, consider the line / given by the equations 


and the line m given by the equations 

x=s, yp=2s—-1, z=1 
Let P be a point on /, and let QO be a point on m. Find the values of ¢ and s that minimize the distance between the lines by 
minimizing the squared distance ||P — || 4 


Answer: 


s=t=1 
Prove: If A has linearly independent column vectors, and if 4x — h is consistent, then the least squares solution of 4x — h and 
the exact solution of Ay = h are the same. 


Prove: If A has linearly independent column vectors, and if b is orthogonal to the column space of A, then the least squares 
solution of 4x = hb isx —0. 

Let P:R™ —, } be the orthogonal projection of R™ onto a subspace W. 

(a) Prove that [P] 2 [P]. 

(b) What does the result in part (a) imply about the composition P 5 P? 

(c) Show that [P] is symmetric. 


Let A be an jy x % matrix with linearly independent row vectors. Find a standard matrix for the orthogonal projection of 2” 
onto the row space of A. [Hint: Start with Formula 10.] 


Answer: 


[P] =A7(AA*) "4 


Prove the implication (2) =+ (a) of Theorem 6.4.3. 


True-False Exercises 


In parts (a)-(h) determine whether the statement is true or false, and justify your answer. 


a) If A is an matrix, then 47 4 is a square matrix. 
XB q 


Answer: 


True 


(b) If 47 4 is invertible, then A is invertible. 


Answer: 


False 


(c) If A is invertible, then 47 4 is invertible. 


Answer: 


True 


(d) If 4x —h is a consistent linear system, then 47 4% — 47h is also consistent. 
Answer: 


True 


(e) If Ax —h is an inconsistent linear system, then 47 4x — 4"b is also inconsistent. 


Answer: 


False 


(f) Every linear system has a least squares solution. 
Answer: 


True 


(g) Every linear system has a unique least squares solution. 
Answer: 


False 


(h) If A is an jy x 2 matrix with linearly independent columns and b is in R”, then 4x — h has a unique least squares solution. 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


6.5 Least Squares Fitting to Data 


In this section we will use results about orthogonal projections in inner product spaces to obtain a technique 
for fitting a line or other polynomial curve to a set of experimentally determined points in the plane. 


Fitting a Curve to Data 


A common problem in experimental work is to obtain a mathematical relationship y = # (x) between two 
variables x and y by “fitting” a curve to points in the plane corresponding to various experimentally 
determined values of x and y, say 


(X1,¥1), 2 an yo (Xn, Yn) 


On the basis of theoretical considerations or simply by observing the pattern of the points, the experimenter 
decides on the general form of the curve » = 7 (x) to be fitted. Some possibilities are (Figure 6.5.1) 


(a) Astraight line: y = @ + bx 
(b) A quadratic polynomial: y = @ + bx 4 ex? 
(c) Acubic polynomial: y = @ + bx 4 ex? + dx" 


Because the points are obtained experimentally, there is often some measurement “error” in the data, making 
it impossible to find a curve of the desired form that passes through all the points. Thus, the idea is to choose 
the curve (by determining its coefficients) that “best” fits the data. We begin with the simplest and most 
common case: fitting a straight line to data points. 


y y 


- 


(a) y=a+bx (b) y=a+bxt+er (c) y=atbx+er+de 


Figure 6.5.1 


Least Squares Fit of a Straight Line 


Suppose we want to fit a straight line y = g 4. bx to the experimentally determined points 


(x1, ¥1); (x2, ¥2), sees (Xy, Yn) 
If the data points were collinear, the line would pass through all 1 points, and the unknown coefficients a and 
b would satisfy the equations 


y2 = a + 2x9 
Yn = @ + OX, 
We can write this system in matrix form as 
1 x1 v1 
1 x2 |[4]_ |¥2 
eo [pep ha 
ie Yn 
or more compactly as 
Myv=y (1) 
where 
vi 1 x 
¥2 ee a 
y=|"" |. Ma]. “71, v=|5 (2) 
Yn ae 


If the data points are not collinear, then it is impossible to find coefficients a and b that satisfy system | 
exactly; that is, the system is inconsistent. In this case we will look for a least squares solution 


* a" 
¥oV = " 


We callaline y=a + x whose coefficients come from a least squares solution a regression line or a 


least squares straight line fit to the data. To explain this terminology, recall that a least squares solution of 1 
minimizes 


lly — A¢v|| (3) 

If we express the square of 3 in terms of components, we obtain 
2 2 2 2 4 
lly — Mv||° = 11 — a — 8x1)" + 2 a — x2)" +... + On — a — bx y) (4) 


If we now let 


dj =|y1-—a—4x}], d2=|y2-@—42}],... dy = |yny—a—bxy| 


then 4 can be written as 
lly — Mv||? =d2 +42 +...443 (5) 


As illustrated in Figure 6.5.2, the number @; can be interpreted as the vertical distance between the line 
y =a + bx and the data point (x,, y,). This distance is a measure of the “error” at the point (x,, y;) 


resulting from the inexact fit of » = g 4- bx to the data points, the assumption being that the x; are known 
exactly and that all the error is in the measurement of the y;. Since 3 and 5 are minimized by the same vector 
y , the least squares straight line fit minimizes the sum of the squares of the estimated errors @ ;» hence the 


name least squares straight line fit. 


Figure 6.5.2 @, measures the vertical error in the least squares straight line. 


Normal Equations 


Recall from Theorem 6.4.2 that the least squares solutions of | can be obtained by solving the associated 
normal system 
Mi My=M'y 


the equations of which are called the normal equations. 


In the exercises it will be shown that the column vectors of M are linearly independent if and only if the n data 
points do not lie on a vertical line in the xy-plane. In this case it follows from Theorem 6.4.4 that the least 
squares solution is unique and is given by 


* 


v= (a7) Mly 


In summary, we have the following theorem. 


THEOREM 6.5.1 Uniqueness of the Least Squares Solution 


Let (x1, ¥1), (2, ¥2),-... (%», ¥y) be a set of two or more data points, not all lying on a vertical 
line, and let 


1 x v1 
M=|! 72) and y=|”? 
1 “ Yn 
Then there is a unique least squares straight line fit 
y=a “4b'x 


to the data points. Moreover, 


is given by the formula 
-l 
v= (m7 a} Mly (6) 


which expresses the fact that y — y” is the unique solution of the normal equations 


Mi Myvy=M'y (7) 


EXAMPLE 1 Least Squares Straight Line Fit << 


Find the least squares straight line fit to the four points (0, 1), (1, 3), (2, 4), and (3, 4). (See 
Figure 6.5.3.) 


Solution We have 


1 0 
11] yr, [4 6 r if 7 =3 
M = MM = d(m7m\ =+ 
1 2 |: | (07a ae | 
1 3 
1 
2 Tarl arty, —_L —3 || 1 1 14/3 1 Be 
’ (Ta) Mly= th] || 3\f4|~ | 4 
4 


so the desired line is y = 1.5 4 x. 


EXAMPLE 2 SpringConstant 


Hooke's law in physics states that the length x of a uniform spring is a linear function of the 
force y applied to it. If we express this relationship as y = g + bx, then the coefficient b is 
called the spring constant. Suppose a particular unstretched spring has a measured length of 6.1 
inches (1.€., x = 6,1] when » = Q). Forces of 2 pounds, 4 pounds, and 6 pounds are then applied 
to the spring, and the corresponding lengths are found to be 7.6 inches, 8.7 inches, and 10.4 
inches (see Figure 6.5.4). Find the spring constant. 


AWA 


Force) 
Figure 6.5.4 
Solution We have 
1 6.1 0 
1 7.6 2 
M = — 

i sz? lar 
1 10.4 6 


and 


. -1 
* a T T =—8.6 
v= —/{M M) Mly x 
| b ’ ( y v4 
where the numerical values have been rounded to one decimal place. Thus, the estimated value 
of the spring constant is 3 * ~ 1.4 pounds/inch. 


. Temperature of Venusian 
450 Atmosphere 


Magellan orbit 3213 
350 Date: 5- October 1991 
Latitude: 67 N 


LTST: 22:05 


Temperature 7(K) 


00 M 
30 40 50 60 70 80 90 L100 
Altitude fh (km) 
Source: NASA 


Historical Note On October 5, 1991 the Magellan spacecraft entered the atmosphere of Venus and 
transmitted thetemperature T in kelvins (K) versus the altitude / in kilometers (km) until its signal 
was lost at an altitude of about 34 km. Discounting theinitial erratic signal, the data strongly 
suggested a linear relationship, so a least squares straight line fit was used on the linear part of the 
data to obtain the equation 


P= 737.5 = 8.125% 
By setting }; — Q in this equation, the surface temperature of Venus was estimated at 7 = 737,5K. 


Least Squares Fit of a Polynomial 


The technique described for fitting a straight line to data points can be generalized to fitting a polynomial of 
specified degree to data points. Let us attempt to fit a polynomial of fixed degree m 


Y=ag + ayx +... ayx™ (8) 


to n points 


(X41, Yi), (x2, ¥2), et (Xn, Yn) 
Substituting these n values of x and y into 8 yields the n equations 


Y1 = @9 + @x, +...+ AmX] 
Y2 = @9 + @xX2 +...+ Amx> 
mm 
Yn = @9 HF AX, +... AmXy 
or, in matrix form, 
y= Mv (9) 


where 


1 2 m 


v1 AB SY, See ag 
2 m a 

y= tes , Ma=|! %2 %2 -- %]) y= i (10) 
Yn Hy 1H $ : am 


1 x, xe... x 
As before, the solutions of the normal equations 
Mi My=M'y 
determine the coefficients of the polynomial, and the vector v minimizes 
lly — 4f¥|| 


Conditions that guarantee the invertibility of a¢7 yg are discussed in the exercises (Exercise 7). If af? af is 
invertible, then the normal equations have a unique solution y — y", which is given by 


_S (uTu\ “my (11) 


EXAMPLE 3 Fitting a Quadratic Curve to Data <@ 


According to Newton's second law of motion, a body near the Earth's surface falls vertically 
downward according to the equation 


s= sq +vot4 set (12) 
where 
s =vertical displacement downward relative to some fixed point 
°0 = initial displacement at time ¢ — 0) 
Y0 = initial velocity at time ¢ = 0 
g =acceleration of gravity at the Earth's surface 


from Equation 12 by releasing a weight with unknown initial displacement and velocity and 
measuring the distance it has fallen at certain times relative to a fixed reference point. Suppose 
that a laboratory experiment is performed to evaluate g. Suppose it is found that at times 
#=.1,.2, .3, .4, and .5 seconds the weight has fallens = — 0.18, 0.31, 1.03, 2.48, and 3.73 
feet, respectively, from the reference point. Find an approximate value of g using these data. 


Solution The mathematical problem is to fit a quadratic curve 
= 2 
s=ag+ayt+ apt (13) 


to the five data points: 
(1, =—0.18), €2,0.31), €3,1.03), (€4,248), (€5, 3.73) 


With the appropriate adjustments in notation, the matrices M and y in 10 are 


i tp 
1 2} [12 9 s1] | -0.18 
2) 11 2 .04 52 0.31 
M=|1 #3 #/=|1 3 09], y=|s3/=| 1.03 
2| |1 4 .16 84 2.48 
1 £4 £ 
: 1) 3.423 S5 3.73 
1 25 - 
Thus, from 11, 
_ {| - —0.40 
vi=lap|=(a7a) M7y=| 0.35 
, 16.1 
“2 


From 12 and 13, we have a3 = 58 so the estimated value of g is 
g= 2a, = 2(16.1) = 32.2 feet / second? 
If desired, we can also estimate the initial displacement and initial velocity of the weight: 
ss = ay = —0.40 feet 
vy = a; = 0.35 feet / second 


In Figure 6.5.5 we have plotted the five data points and the approximating polynomial. 


Distance s (in feet) 


Time ¢ {in seconds) 


Figure 6.5.5 


Concept Review 
e Least squares straight line fit 
e Regression line 


e Least squares polynomial fit 


Skills 


e Find the least squares straight line fit to a set of data points. 
e Find the least squares polynomial fit to a set of data points. 


e Use the techniques of this section to solve applied problems. 


Exercise Set 6.5 


1. Find the least squares straight line fit to the three points (0, 0), (1, 2), and (2, 7). 
Answer: 
ee Orin t 
y= 5 t 5% 
2. Find the least squares straight line fit to the four points (0, 1), (2, 0), (3, 1), and (3, 2). 


Ge 


. Find the quadratic polynomial that best fits the four points (2, 0), (3, — 10), (5, —48), and (6, — 76). 


Answer: 


y =24 5x = 3x? 


. Find the cubic polynomial that best fits the five points (= 1, — 14), (0, —5), (1, —4), (2, 1), and 
(3, 2a) 
. Show that the matrix M in Equation 2 has linearly independent columns if and only if at least two of the 


numbers x1, X32, --., Xy are distinct. 


. Show that the columns of the » x (#2 + 1) matrix M in Equation 10 are linearly independent if », = j; and 


at least »2 4. 1 of the numbers x1, x3, ..., X» are distinct. [Hint: A nonzero polynomial of degreem has at 
most m distinct roots. ] 


. Let M be the matrix in Equation 10. Using Exercise 6, show that a sufficient condition for the matrix 


M1 M to be invertible is that » ~ 2 and that at least j, 4. 1 of the numbers x1, x3, ..., X» are distinct. 


. The owner of a rapidly expanding business finds that for the first five months of the year the sales (in 


thousands) are $4.0, $4.4, $5.2, $6.4, and $8.0. The owner plots these figures on a graph and conjectures 
that for the rest of the year, the sales curve can be approximated by a quadratic polynomial. Find the least 
squares quadratic polynomial fit to the sales curve, and use it to project the sales for the twelfth month of 
the year. 


. A corporation obtains the following data relating the number of sales representatives on its staff to annual 


Number of 
Sales Representatives 5 10} 15 | 20 | 25 | 30 


sales: 


stn 4 8 [2 [os [7S 


Explain how you might use least squares methods to estimate the annual sales with 45 representatives, and 
discuss the assumptions that you are making. (You need not perform the actual computations.) 


10. Pathfinder is an experimental, lightweight,remotely piloted,solar-powered aircraft that was used in aseries 


11. 


of experiments by NASA to determine the feasibilityof applyingsolar power for long-duration, high- 
altitude flight. In August 1997 Pathfinder recordedthe data in the accompanying table relating altitude H 
and temperature 7. Show that a linear model is reasonable by plotting the data, and then find theleast 
squares line Y = Hq + kT of best fit. 


Table Ex-10 
Altitude H 
(thousands of feet)| 15| 20 25 


30 35 40 45 
Temperature T 
(°C) 4.5}-5.9|/-1 27.6|—39.8]—50.2]—62.9 


Find a curve of the form » = g + (4 / x) that best fits the data points (1, 7), (3, 3), (6, 1) by making the 
substitution ¥ — ] / x. Draw the curve and plot the data points in the same coordinate system. 


6.1 


Answer: 
2 4 48 
Y= 91 * 7x 


True-False Exercises 


In parts (a)-(d) determine whether the statement is true or false, and justify your answer. 


(c) 


(a) Every set of data points has a unique least squares straight line fit. 


Answer: 


False 


(b) If the data points (x1, 71), (%2, ¥2), -... (X», ¥») are not collinear, then 1 is an inconsistent system. 


Answer: 


True 


If y = a@ + bx is the least squares line fit to the data points (x1, ¥1), (%2, ¥2),--+ (%», Ym), then 
d; = |v; — (a + 4x;)| is minimal for every 1 <i < x. 


Answer: 


False 


(d) If »y = @ + dx is the least squares line fit to the data points (x1, 71), (%2, ¥2),--. (%», Ym), then 
ta] 
>» a —(a+x I is minimal. 
i=1 


Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


6.6 Function Approximation; Fourier Series 


In this section we will show orthogonal projections can be used to approximate certain types of functions by 
simpler functions that are easier to work with. The ideas explained here have important applications in 
engineering and science. Calculus is required. 


Best Approximations 


All of the problems that we will study in this section will be special cases of the following general problem. 


APPROXIMATION PROBLEM 


Given a function f that is continuous on an interval [@, #], find the “best possible approximation” to f 
using only functions from a specified subspace W of C'[a, 5]. 


Here are some examples of such problems: 
(a) Find the best possible approximation to 2* over [0, 1] by a polynomial of the form ag +ajx4 ax’. 


(b) Find the best possible approximation to gingx over [ — 1, 1] by a function of the form 
ago aye” | are t aqe?*. 
(c) Find the best possible approximation to x over [0, 2m] by a function of the form 
ag + asin x + azsin 2x + bycos x + b3c08 2x. 
In the first example W is the subspace of C’[0, 1] spanned by 1, x, and x2; in the second example W is the 
subspace of C’'[ — 1, 1] spanned by 1, 2”, 22%, and 22”; and in the third example W is the subspace of 


C'[0, 2a] spanned by 1, sin x, sin 2x, cos x, and cog 2x. 


Measurements of Error 


To solve approximation problems of the preceding types, we first need to make the phrase “best 
approximation over [@, #]” mathematically precise. To do this we will need some way of quantifying the 
error that results when one continuous function is approximated by another over an interval [a, 5]. If we 
were to approximate 7 (x) by g(x}, and if we were concerned only with the error in that approximation at a 
single point XQ, then it would be natural to define the error to be 

etror = |f (x9) — g(x0)| 
sometimes called the deviation between fand g at Xg (Figure 6.6.1). However, we are not concerned simply 
with measuring the error at a single point but rather with measuring it over the entire interval [a, >]. The 
problem is that an approximation may have small deviations in one part of the interval and large deviations in 
another. One possible way of accounting for this is to integrate the deviation |f (x) — g(x}| over the interval 
[a, >] and define the error over the interval to be 


JF (x) — g(x)| dx (1) 


error = [ 
Q 


Geometrically, 1 is the area between the graphs of # {x} and g(x) over the interval [a, ®] (Figure 6.6.2); the 
greater the area, the greater the overall error. 


| S(%) rat «| 


a . b 


Figure 6.6.1 The deviation between fand g x 


a b 
Figure 6.6.2 The area between the graphs of f and g over [a, b] measures the error in approximating f 


by g over [a, 5] 


Although | is natural and appealing geometrically, most mathematicians and scientists generally favor the 
following alternative measure of error, called the mean square error: 


mean square error = p [7 (x) —g(x)]? ax 


Mean square error emphasizes the effect of larger errors because of the squaring and has the added advantage 
that it allows us to bring to bear the theory of inner product spaces. To see how, suppose that f is a continuous 
function on [@, #] that we want to approximate by a function g from a subspace W of Ca, ®], and suppose 
that C[a@, 5] is given the inner product 


if, g)= [ f (x)g(x) dx 


It follows that 


lf —gll? =(f -ef-)=[4@) —g(x)]* dx = mean square error 


so minimizing the mean square error is the same as minimizing ||f — g|| 2’ Thus the approximation problem 


posed informally at the beginning of this section can be restated more precisely as follows. 


Least Squares Approximation 


LEAST SQUARES APPROXIMATION PROBLEM 


Let f be a function that is continuous on an interval [a, 2], let C[@, b] have the inner product 


(f.g}= [re@em ax 


and let W be a finite-dimensional subspace of C'[a@, 4]. Find a function g in W that minimizes 


it al?= f L4G) -e@ ax 


Since ||f = g||? and ||f — g|| are minimized by the same function g, this problem is equivalent to looking for a 


function g in W that is closest to f. But we know from Theorem 6.4.1 that g = projy; f is such a function 
(Figure 6.6.3). 


f = function in C[a, }] 
to be approximated 


2 = proj yf = least squares 
approximation 
subspace of to f from W 
approximating 
functions 


Ww 


Figure 6.6.3 


Thus, we have the following result. 


THEOREM 6.6.1 


If f is a continuous function on [@, 2], and W is a finite-dimensional subspace of C[a@, 4], then the 
function g in W that minimizes the mean square error 


[ Lf (x) -—g(x)]? ax 


is @ = proj; f, where the orthogonal projection is relative to the inner product 
g = projyr g pro} p 


if, g)= [ f (x)g(x) dx 


The function g = proj; f is called the last squares approximation to f from W. 


Fourier Series 


A function of the form 
T(x) =cg +c jc0s x +9008 2x ++ + + + +c,cos mx +dysnx +d asm 2x+ °° +: +dysinux (2) 


is called a trigonometric polynomial; if ¢y and @,, are not both zero, then T(x) is said to have order n. For 
example, 
T(x) =2+cosx—3 cos 2x +7 sm4x 
is a trigonometric polynomial of order 4 with 
e9=2,¢, = 1,¢e7= —3,¢3=0,c4=0, d} =0,d2=—0,d3=0,d4=7 


It is evident from 2 that the trigonometric polynomials of order n or less are the various possible linear 
combinations of 


1, cos x, cos 2x,..., cos mx, six, sin 2x,..., sin x (3) 


It can be shown that these 2» 4- 1 functions are linearly independent and thus form a basis for a (2% + 1) 
-dimensional subspace of C[a, )]. 


Let us now consider the problem of finding the least squares approximation of a continuous function 7 (x) 
over the interval [0, 27] by a trigonometric polynomial of order n or less. As noted above, the least squares 
approximation to f from W is the orthogonal projection of f on W. To find this orthogonal projection, we must 
find an orthonormal basis gg, g}, --., 2, for W, after which we can compute the orthogonal projection on W 
from the formula 


prow f =(f, gojgo+(f. gijgi t+: + * +(£, g2n)gan (4) 


(see Theorem 6.3.45). An orthonormal basis for W can be obtained by applying the Gram—Schmidt process to 
the basis vectors in 3 using the inner product 


27 
(Ea)= fo fats) dx 


This yields the orthonormal basis 
1 aot 


1 
go = >. 21 = = COS KX, .... Zn = = COS HX, 
y 20 yx . yin 


1. 1. (5) 
ee i 


(see Exercise 6). If we introduce the notation 


2 1 1 
ag = =f. go}. a1 = =f. 21 a,=—=lf,¢g 
on ya n ya PA] 
; (6) 
by oat f, Snti).--» 9n= =f, gan 
then on substituting 5 in 4, we obtain 
prof = > + [ajcosx+ +++ +aycos mx] + [Ajsmx+ ++ + +4y,sm xx] (7) 
where 
ap = Alf, e0 = "40 Feav= if F(x)dx 
Qn Qn Qn 
1 1 
oA g1 -+f" ee rela | eeorcee. 
1 1 27 
ay, = —If, g, =f “f@)f cos mx dx = f J (x) cos mx ax 
i i i who 
by = LIF, gn -Lf faa * ex) sin x dx 
yx yrJo ~ yr “Jo 
1 oe ere | 1" 
by, = —If, go -+/ Six) smauxdx=—] f(x) sinanx dx 
Sel | fda 7 fe “Jo 
In short, 
1 20 1 20 
= i/ F (x)cos &x dx, dy = i/ J (x)sin kx dx (8) 
aw O a 0 


The numbers apg, @1, -.., @y, 21, -.., By are called the Fourier coefficients of f. 
EXAMPLE 1 Least Squares Approximations 


Find the least squares approximation of # (x) =x on [0, 27] by 
(a) atrigonometric polynomial of order 2 or less; 


(b) a trigonometric polynomial of order 7 or less. 


Solution 

(a) 1 27 1 27 

ag== f@as=3f xax=2n (9a) 
a O a O 


For k= 1, 2, ..., integration by parts yields (verify) 


(b 


— 


1 20 1 20 
ar = =f f(x) cos kx dx = 1 | x cos kx dx =0 (9b) 
w 0 a 0 


20 1 20 > 
bb = = Ff (x)sinkx dx = 3 | x sin kx dx = — + (9c) 
0 “JO 


Thus, the least squares approximation to x on [0, 2m] by a trigonometric polynomial of 
order 2 or less is 


x + ajcos x + azcos 2x + bjs x + 49 sin 2x 


or, from (9a), (9b), and (9c), 
xeea—2 sinx —sin 2x 
The least squares approximation to x on [0, 2m] by a trigonometric polynomial of order n 


or less is 


a + [ajcosx+ +++ +a,cosvx] + [bismx+- + + +,sin 2x] 


or, from (9a), (9b), and (9c), 


xe 2(sin x + See ; ange y i + Sam) 


The graphs of y = x and some of these approximations are shown in Figure 6.6.4. 


aw —2(sinx + S22 4 sind ain 4c ) 
: pk 3 


+ So + 
6 3 4 | 


a 
sin x + 22t sin 2x + —| 


Ww 


sinx + 282 2) 


i] 


Figure 6.6.4 


It is natural to expect that the mean square error will diminish as the number of terms in the 
least squares approximation 


fx)w H+ x (a,cos kx + b,sm kx) 


increases. It can be proved that for eu fin C[0, 2m], the mean square error 
approaches zero as x» — + oo; this is denoted by writing 


f(x) = 204 3° (apcos dx + dysin Ex) 
k=1 


The right side of this equation is called the Fourier series for f over the interval [0, 27]. 
Such series are of major importance in engineering, science, and mathematics. 


Jean Baptiste Fourier (1768-1830) 


Historical Note Fourier was a French mathematician and physicist who discovered 
the Fourier series and related ideas while working on problems of heat diffusion. This 
discovery was one of the most influential in the history of mathematics; it is the 
cornerstone of many fields of mathematical research and a basic tool in many branches 
of engineering. Fourier, a political activist during the French revolution, spent time in 
jail for his defense of many victims during the Terror. He later became a favorite of 
Napoleon and was named a baron. 

[Image. The Granger Collection, New York] 


Concept Review 

e Approximation of functions 
e Mean square error 

e Least squares approximation 
° Trigonometric polynomial 

e Fourier coefficients 


e Fourier series 


Skills 
e Find the least squares approximation of a function. 
e Find the mean square error of the least squares approximation of a function. 


e Compute the Fourier series of a function. 


Exercise Set 6.6 


1. Find the least squares approximation of # (x) = 1 + x over the interval [0, 20] by 
(a) a trigonometric polynomial of order 2 or less. 


(b) a trigonometric polynomial of order 7 or less. 


Answer: 


(a) (1+a) —2 smx—sm 2x 


b ee ie sin 2x, sin 3x sin 2X 
(6) (145) 2 sin x 4 ey FE aa | 


2 


2. Find the least squares approximation of # (x) = x“ over the interval [0, 27] by 


(a) a trigonometric polynomial of order 3 or less. 


(b) a trigonometric polynomial of order 7 or less. 


3. (a) Find the least squares approximation of x over the interval [0, 1] by a function of the form g + be’. 


(b) Find the mean square error of the approximation. 


Answer: 


12 2(1—e) 
*(a) Find the least squares approximation of g* over the interval [0, 1] by a polynomial of the form 
ag + ax. 


(b) Find the mean square error of the approximation. 


5. (a) Find the least squares approximation of sin, x over the interval [—1, 1] by a polynomial of the form 
2 
aq + ayX + aQx"- 


(b) Find the mean square error of the approximation. 
Answer: 


(a) 3x 
6 
(b) 1— = 


6. Use the Gram—Schmidt process to obtain the orthonormal basis 5 from the basis 3. 
7. Carry out the integrations indicated in Formulas 9a, 9b, and 9c. 


8. Find the Fourier series of # {x} =a — x over the interval [0, 27]. 


9. Find the Fourier series of f(x} = 1,0<x <aand f(x) =0, 97 < x < 2m over the interval [0, 27]. 
Answer: 
5+ Lal! ae 1)* sin kx 

10. What is the Fourier series of sin(3x}? 

True-False Exercises 

In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 


(a) Ifa function f in C[a@, 4] is approximated by the function g, then the mean square error is the same as the 
area between the graphs of # (x) and g(x) over the interval [a, »]. 


Answer: 


False 
(b) Given a finite-dimensional subspace W of Ca, 5], the function g = projw f minimizes the mean square 


error. 
Answer: 


True 


(c) {1, cosx, sinx, cos2x, sin2x} is an orthogonal subset of the vector space C’'[0, 27] with respect to the 


20 
inner product (f, g}= : J (x)g(x) ax- 


Answer: 


True 


(d) {1, cosx, sinx, cos2x, sim2x} is an orthonormal subset of the vector space C'[0, 2] with respect to the 


21 
inner product (f, g} = , J (xjg(x) ax. 


Answer: 


False 


(e) {1, cosx, sinx, cos2x, sin2x} is a linearly independent subset of C'[0, 27]. 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


Chapter 6 Supplementary Exercises 


. Let 24 have the Euclidean inner product. 


(a) Find a vector in p4 that is orthogonal to uj = (1, 0, 0, 0) and ug = (0, 0, 0, 1) and makes equal 
angles with uz = (0, 1, 0, 0) and uz = (0, 0, 1, 0). 


(b) Find a vector x = (x1, X32, x3, x4) of length | that is orthogonal to Uj and U4 above and such that the 
cosine of the angle between x and U2 is twice the cosine of the angle between x and uz. 


Answer: 
(a) (0,a,a,0) withg +0 
(b) o. Feo) 
¥5 ¥5 
. Prove: If {u, v} is the Euclidean inner product on R”, and if A is an » x » matrix, then 


(u, Av| = (47u, v} 


[| Hint: Use the fact that (u, v} =u-v= vu] 


* Let M39 have the inner product | U, v| =ir (u 7) =tr (v 7u that was defined in Example 6 of 


Section 6.1 . Describe the orthogonal complement of 
(a) the subspace of all diagonal matrices. 


(b) the subspace of symmetric matrices. 
Answer: 
(a) The subspace of all matrices in Af with only zeros on the diagonal. 


(b) The subspace of all skew-symmetric matrices in Af 3. 


. Let Ax = Q be a system of m equations in 7 unknowns. Show that 
x1 


is a solution of this system if and only if the vector x = (x1, X2, -.., X,) 1s orthogonal to every row vector 
of A with respect to the Euclidean inner product on 2”. 


. Use the Cauchy—Schwarz inequality to show that if a1, @3, ..., @, are positive real numbers, then 
ae ob Nhe oe | ee 
(a, +ag+ + ay) & ++ 2 + + Zn an 


. Show that if x and y are vectors in an inner product space and c is any scalar, then 


lex + yl? =e7 fall? + 2elx, y} + ly? 


7. Let 23 have the Euclidean inner product. Find two vectors of length 1 that are orthogonal to all three of 
the vectors uy = (1, 1, —1),ug=(—2, =—1, 2), andu3=(—1, 0, 1). 


Answer: 
1 1 
+ |——, 0, —= 
| y2° ” y2 | 
8. Find a weighted Euclidean inner product on 2” such that the vectors 
sy = (7,.0:9,.6.0) 


(0, 2, 0,.... 0) 
v3 = (0, 0, ¥3,.... 0) 


7 = (0,0,0,... 4x} 


V2 


form an orthonormal set. 
9. Is there a weighted Euclidean inner product on 22 for which the vectors (1, 2) and (3, — 1) form an 
orthonormal set? Justify your answer. 


Answer: 


No 


10. If u and v are vectors in an inner product space /’, then u, v, and y — y can be regarded as sides of a 


“triangle” in V (see the accompanying figure). Prove that the law of cosines holds for any such triangle; 
that is, 


2 2 2 
|ju = ll" = lull” + [l¥ll” = 2]lull|i¥licos 6 
where f7 is the angle between u and v. 


Figure Ex-10 


11. (a) As shown in Figure 3.2.6, the vectors (A, 0, 0), (0, &, 0), and (0, 0, &) form the edges of a cube in R? 
with diagonal («, &, &). Similarly, the vectors 


(x, 0,0,..,0), (0,%,0,...,0),.., (0,0,0,..,4) 


can be regarded as edges of a “cube” in R” with diagonal (k, &, k, ..., &). Show that each of the above 
edges makes an angle of 0 with the diagonal, where cos A= 1} yn- 


(b) Calculus required What happens to the angle @ inpart (a) as the dimension of R” approaches 4-9? 


12. 


13. 


14. 


15. 
16. 


17. 


18. 


19. 


Answer: 


au 


(b) @ approaches 5 


Let u and v be vectors in an inner product space. 
(a) Prove that ||u|| = ||¥v|| if and only if y + y and y — y are orthogonal. 
(b) Give a geometric interpretation of this result in 2 with the Euclidean inner product. 


Let u be a vector in an inner product space V, and let {v1, v2, -.., ¥,} be an orthonormal basis for V. 
Show that if a; is the angle between u and ¥;, then 


cos“ay | cos“ay Free cos“ay, =] 
Prove: If {u, ¥}, and {u, v}, are two inner products on a vector space V, then the quantity 
{u, v} =u, v}, ++ (u, ¥}, is also an inner product. 
Prove Theorem 6.2.5. 


Prove: If A has linearly independent column vectors, and if b is orthogonal to the column space of A,then 
the least squares solution of 4x = h is x = 0). 


Is there any value of s for which x; = 1 and x3 = 2 is the leastsquares solution of the following linear 
system? 


Xx} = x9 = 1 
2x1 + 3x2 = 1 
4x; + 5x2 = 5 


Explain your reasoning. 
Answer: 


No 


Show that if p and qg are distinct positive integers, then the functions 7 (x) = sin px and g(x) = sin gx are 
orthogonal with respect to the inner product 


27 
(fa)= fo fax) dx 


Show that if p and q are positive integers, then the functions # {x} = cos px and g(x) = sin gx are 
orthogonal with respect to the inner product 


27 
(f.e)= [ s@de@) as 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


| CHAPTER ei 


Diagonalization and 
Quadratic Forms 


CHAPTER CONTENTS 


7.1. Orthogonal Matrices 

7.2. Orthogonal Diagonalization 

7.3. Quadratic Forms 

7.4. Optimization Using Quadratic Forms 


7.5. Hermitian, Unitary, and Normal Matrices 


INTRODUCTION 


In Section 5.2 we found conditions that guaranteed the diagonalizability of an » x » 
matrix, but we did not consider what class or classes of matrices might actually satisfy 
those conditions. In this chapter we will show that every symmetric matrix is 
diagonalizable. This is an extremely important result because many applications utilize it 
in some essential way. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


7.1 Orthogonal Matrices 


In this section we will discuss the class of matrices whose inverses can be obtained by transposition. Such matrices occur in a variety of 
applications and arise as well as transition matrices when one orthonormal basis is changed to another. 


Orthogonal Matrices 


We begin with the following definition. 


DEFINITION 1 
A square matrix A is said to be orthogonal if its transpose is the same as its inverse, that is, if 
At=al 
or, equivalently, if 
AAT =ATA=1 


Recall from Theorem 1.6.3 that if either product in 1 holds, then 
so does the other. Thus, A is orthogonal if either 447 =} or 


ATA=i. 


EXAMPLE 1 A3~ 3 Orthogonal Matrix <4 


The matrix 


22 6 

7 7 7 

=|—_8 2 2 

a= yn os! 

26 _3 

77 7 

is orthogonal since 

2.6 2]; 22 & 
ry_[2 3 s{|-s3 2)[1°° 
seal iy ee a | (i OP ay 

6 2 _3)/} 26 _3 

t £9 7 7 7 


EXAMPLE 2 Rotation and Reflection Matrices are Orthogonal <4 


Recall from Table Table 5 of Section 4.9 that the standard matrix for the counterclockwise rotation of 2? through an angle 0 is 
A=| oes é@ —snd 
sn cosé 


This matrix is orthogonal for all choices of 0 since 


ATA= cos@ sind |}cos#@ —siné = 10 
=—sinf cos@|| sn@ cos@ 0 1 


We leave it for you to verify that the reflection matrices in Tables Table 1 and Table 2 and the rotation matrices in Table Table 6 of 


Section 4.9 are all orthogonal. 


Observe that for the orthogonal matrices in Example | and Example 2, both the row vectors and the column vectors form orthonormal sets with 
respect to the Euclidean inner product. This is a consequence of the following theorem. 


THEOREM 7.1.1 


The following are equivalent for an »z ~ »% matrix A. 
(a) Ais orthogonal 
(b) The row vectors of A form an orthonormal set in R” with the Euclidean inner product. 


c) The column vectors of A form an orthonormal set in R” with the Euclidean inner product. 
p 


Proof We will prove the equivalence of (a) and (b) and leave the equivalence of (a) and (c) as an exercise. 


(a) = (b) The entry in the ith row and jth column of the matrix product 447 is the dot product of the ith row vector of A and the jth column 
vector of 47 (see Formula 5 of Section 1.3). But except for a difference in form, the jth column vector of ,47 is the jth row vector of A. Thus, if the 
row vectors of A are rj, F3, ..., Y,, then the matrix product 4,47 can be expressed as 


ry Ty Ty'r2 --. T1°Ty 
r2‘ry rg'rg ... r2'r. 
AAT = 2° 1 2 2 2° n 
Ty'h, Mh'h2 -.. Ty'ly 


[see Formula 28 of Section 3.2]. Thus, it follows that 447 — ; if and only if 


rypcrp=1p-rg=...=%ry y= 1 


and 
ry rj; =O wheni 4 j 


which are true if and only if {r1, r2, ....r,} is an orthonormal set in R”. 


WARNING 


Note that an orthogonal matrix is one with orthonormal rows and columns—not simply orthogonal rows and columns. 


The following theorem lists three more fundamental properties of orthogonal matrices. The proofs are all straightforward and are left as exercises. 


THEOREM 7.1.2 


(a) The inverse of an orthogonal matrix is orthogonal. 
(b) A product of orthogonal matrices is orthogonal. 
(c) IfA is orthogonal, then det(.4) = 1 or det(.A) = — 1. 


EXAMPLE 3. det(A) = +1 for an Orthogonal MatrixA <4 


The matrix 


is orthogonal since its row (and column) vectors form orthonormal sets in 2 with the Euclidean inner product. We leave it for you 
to verify that det(.4) = 1 and that interchanging the rows produces an orthogonal matrix whose determinant is —1. 


Orthogonal Matrices as Linear Operators 


We observed in Example 2 that the standard matrices for the basic reflection and rotation operators on 2 and R? are orthogonal. The next theorem 
will explain why this is so. 


THEOREM 7.1.3 


If A is an »z x y% matrix, then the following are equivalent. 
(a) A is orthogonal. 

(b) || Ax|| = ||x|| for all x in R”. 

(c) Ax+ Ay =x- y forall x andy in 2”. 


Proof We will prove the sequence of implications (a) > (b) > (c) > (a). 


(a) = (b) Assume that 4 is orthogonal, so that 474 — j. It follows from Formula 26 of Section 3.2 that 
1/2 
| Axl] = (4c - Ax)? = (es AT Ax) = (ex)? = Il 
(b) = (c) Assume that ||.Ax|| = ||x|| for all x in R”. From Theorem 3.2.7 we have 
Ax Ay = Flde+ Ayl? — Fae yl? = FIA[x+y I? - Fl4(x—y I? 
4 4 4 4 
=. oh is Ele else ee 
= gix+ylP -—glx—-yll =x-y 
(c) = (a) Assume that Ax - Ay =x- y for all x and y in 2”. It follows from Formula 26 of Section 3.2 that 
x'y=x- Al ay 
which can be rewritten as X ° (4 Ay = y}| = 0oras 
x: (474—Z}y = 0 
. , a Gulp tar 
Since this equation holds for all x in 2”, it holds in particular ifx = |4A° A—/ Jy, so 
(474—J}y (474 —t\y= 0 
Thus, it follows from the positivity axiom for inner products that 
(474 = iy =0 


Since this equation is satisfied by every vector y in 2”, it must be that 4 r A—/is the zero matrix (why?) and hence that 4 r A —/j. Thus, A is 
orthogonal. 


Theorem 7.1.3 has a useful geometric interpretation when considered from the viewpoint of matrix transformations: If A is an orthogonal matrix 
and 7 4:R" — R” is multiplication by A, then we will call 74 an orthogonal operator on R”. It follows from parts (a) and (b) of Theorem 7.1.3 
that the orthogonal operators on 8” are precisely those operators that leave the lengths of all vectors unchanged. This explains why, in Example 2, 
we found the standard matrices for the basic reflections and rotations of R2 and 2? to be orthogonal. 


Parts (a) and (c) of Theorem 7.1.3 imply that orthogonal 
operators leave the angle between two vectors unchanged. Why? 


Change of Orthonormal Basis 


Orthonormal bases for inner product spaces are convenient because, as the following theorem shows, many familiar formulas hold for such bases. 
We leave the proof as an exercise. 


THEOREM 7.1.4 


If S is an orthonormal basis for an n-dimensional inner product space 7, and if 


(u) 9 = (41,42, -..¥,) and (v) y= (11, v2, -... Vn) 


then: 
(@) \ull = Yu? u2 + ~~ - +u2 

= 2 2 2 
() fu, v)= Yay v1)? + (ug 02)? + + + + in Yn) 
(c) {u,v} =41v1 FuQV2+ St FEyVy, 


Remark Note that the three parts of Theorem 7.1.4 can be expressed as 


lull = IIa) sll 2(u, ¥) = d((u) g, (¥) 5) (u. ¥} = (CW) g, (W) 5} 


where the norm, distance, and inner product on the left sides are relative to the inner product on V and on the right sides are relative to the 
Euclidean inner product on 2”. 


Transitions between orthonormal bases for an inner product space are of special importance in geometry and various applications. The following 
theorem, whose proof is deferred to the end of this section, is concerned with transitions of this type. 


THEOREM 7.1.5 


Let V be a finite-dimensional inner product space. If P is the transition matrix from one orthonormal basis for V to another orthonormal 
basis for V, then P is an orthogonal matrix. 


EXAMPLE 4 Rotation of Axesin2-Space <4 


In many problems a rectangular xy-coordinate system is given, and a new x ' y ' -coordinate system is obtained by rotating the 
xy-system counterclockwise about the origin through an angle 0. When this is done, each point Q in the plane has two sets of 

; : ; ‘ Lae : : 
coordinates—coordinates (x, ) relative to the xy-system and coordinates (x Jv ) relative to the x'y'-system (Figure 7.1.1a). 


cos (0 +3) 


(a) 2) (c) (d) 


Figure 7.1.1 


By introducing unit vectors Uj and U3 along the positive x- and y-axes and unit vectors uy and u, along the positive x'- and y!-axes, 
: : ‘ . pl roe : 
we can regard this rotation as a change from an old basis B= {uj, uz} to anew basis 3 = fu, u)} (Figure 7.1.15). Thus, the new 


coordinates (x ¥ ‘) and the old coordinates (x, y) of a point Q will be related by 


eae 9 


where P is the transition from B’ to B. To find P we must determine the coordinate matrices of the new basis vectors uj and u 
relative to the old basis. As indicated in Figure 7.1.1c, the components of uy in the old basis are cos 0 and sin 0, so 


! cos @ 
uy|,=|.. 
[ ile | sin 
Similarly, from Figure 7.1.1d, we see that the components of wu) in the old basis are cos(@ + 7 / 2) = = sin @ and 


sin(@ + 7/2) =cos @, so 
157 _ | —siné 
[~]e= cos 5 


Thus the transition matrix from B’ to B is 


P= cos# —sind 3 
sn@ cosé (3) 


Observe that P is an orthogonal matrix, as expected, since B and B’ are orthonormal bases. Thus 


pt=PT=| cos al 


—snf cos 

so 2 yields 

x! _ | cos# sind |[x 4 

y'! =—sinO cos 6 ||” ©) 
or, equivalently, 

x = xcosO+y snd 

1 (5) 
y = =xsnd+ycosé 


These are sometimes called the rotation equations for R2. 


EXAMPLE 5 Rotation of Axesin2-Space << 


Use form 4 of the rotation equations for p2 to find the new coordinates of the point (2, 1) if the coordinate axes of a rectangular 


coordinate system are rotated through an angle of # = q / 4. 


Solution Since 


the equation in 4 becomes 


4 Sak |b] 


P| fn °F ‘/- 3 
y! _1 1'|/-1 spe 
2 2 f2 


so the new coordinates of Q are eo |= an, aoe : 
y2° 2 


—— 


Remark Observe that the coefficient matrix in 4 is the same as the standard matrix for the linear operator that rotates the vectors of R2 through 


the angle —@ (see margin note for Table 5 of Section 4.9). This is to be expected since rotating the coordinate axes through the angle @ with the 
vectors of R2 kept fixed has the same effect as rotating the vectors in R2 through the angle —@ with the axes kept fixed. 


EXAMPLE 6 Application to Rotation of Axes in 3-Space 


Suppose that a rectangular xyz-coordinate system is rotated around its z-axis counterclockwise (looking down the positive z-axis) 
through an angle 0 (Figure 7.1.2). If we introduce unit vectors Uj, U2, and U3 along the positive x-, y-, and z-axes and unit vectors uy re 
u), and u; along the positive x'-, y'-, and z’-axes, we can regard the rotation as a change from the old basis B = {uj, U2, uz} to the 


new basis 2 ‘= fu; : U), u; }. In light of Example 4, it should be evident that 


cos @ =sin 
[4 ]e= sin@ | and [%]2= cos 
0 0 


Moreover, since U; extends 1 unit up the positive z'-axis, 


ae 0 
[w3]e=|° 
1 


Figure 7.1.2 
It follows that the transition matrix from B’ to B is 
cos# —sné 0 
P=!/sinO cos# 0 
0 0 1 
and the transition matrix from B to B’ is 
cos@ sin@ 0 
P+=|—sin@ cos@ 0 
0 0 1 


(verify). Thus, the new coordinates (x a z') of a point Q can be computed from its old coordinates (x, y, z) by 


x cos# sinf Olfx 
y! =|—sn@ cos# O}|¥ 
z! 0 0 14L? 


OPTIONAL 


We conclude this section with an optional proof of Theorem 7.1.5. 


Proof of Theorem 7.1.5 Assume that V is an n-dimensional inner product space and that P is the transition matrix from an orthonormal basis 


B' to an orthonormal basis B. We will denote the norm relative to the inner product on V by the symbol || || p to distinguish it from the norm 
relative to the Euclidean inner product on 2”, which we will denote by || ||. 


Recall that (u) » denotes a coordinate vector expressed in 
comma-delimited form whereas [u] » denotes a coordinate vector 
expressed in column form. 


To prove that P is orthogonal, we will use Theorem 7.1.3 and show that ||x|| = ||x|| for every vector x in 8”. As a first step in this direction, 


recall from Theorem 7.1.4a that for any orthonormal basis for V the norm of any vector u in V is the same as the norm of its coordinate vector with 


respect to the Euclidean inner product, that is 


lal y= Iu) pel = Ile) sll 


or 
lull y= Ila) gel = PCa] ell (6) 


Now let x be any vector in 2”, and let u be the vector in V whose coordinate vector with respect to the basis B’ is x; that is, [u] p* =x. Thus, from 
6 


> 


[ful] = Ill] = [Px 


which proves that P is orthogonal. 


Concept Review 

° Orthogonal matrix 

° Orthogonal operator 

¢ Properties of orthogonal matrices. 

¢ Geometric properties of an orthogonal operator 


¢ Properties of transition matrices from one orthonormal basis to another. 


Skills 
° Be able to identify an orthogonal matrix. 
¢ Know the possible values for the determinant of an orthogonal matrix. 


¢ Find the new coordinates of a point resulting from a rotation of axes. 


Exercise Set 7.1 


1. (a) Show that the matrix 


4 3 
- 5 
$9 4 _12 
A=|-95 5 ~25 
123 16 
25 5 25 


is orthogonal in three ways: by calculating 47 4, by using part (b) of Theorem 7.1.1, and by using part (c) of Theorem 7.1.1. 


(b) Find the inverse of the matrix A in part (a). 


Answer: 
(b); 4 39 12 
5 25 25 
4 3 
: 5 5 
2 12 16 
5 25 25 


2. (a) Show that the matrix 


l 
| 
We WIP wp 


WINS WIN We 


WIP bof Lolpo 


is orthogonal. 


(b) Let 7:3 —, R3 be multiplication by the matrix A in part (a). Find T(x) for the vector x = ( — 2, 3, 5). Using the Euclidean inner product 
on 3, verify that || F(x) || = ||x|]. 


3. Determine which of the following matrices are orthogonal. For those that are orthogonal, find the inverse. 


(a) | 1 0 
0 1 
(b) }_1 __1 
y2 ¥2 
ety 21 
y2 2 
© lo 7 Lb 
y2 
10 0 
1 
00 — 
y2 
4 a Ce ORR 
f2 yo ¥3 
@: el, a 
6 ¥3 
i> calls, 2 
v2 yo ¥3 
@m)i 12 1 1 
2 2. ote 12 
ts 25, <1. walk 
2 6 6 6 
ee ae 
2 6 6 6 
le Ate at 
2 6 6 6 
(f) }1 0 0.0 
1 1 
0 — -= 0 
3 2 
1 
0 — 01 
3 
1 1 
0 — = 0 
(3 2 
Answer: 
(a) |1 0 
0 1 
(b) Al 
2 
= ale 
y2 


sae Sale 
ak ak St 


(e) 


I 
Ale Ale vAlrA ple 


Ale Al Ale wl 


AlA Ale Ale ple 


. Prove that if A is orthogonal, then 47 is orthogonal. 


5. Verify that the reflection matrices in Tables Table 1 and Table 2 of Section 4.9 are orthogonal. 


6. Let a rectangular x'y! coordinate system be obtained by rotating a rectangular xy-coordinate system counterclockwise through the angle 


10. 
11. 


12. 


A= 30! 4. 


(a) Find the x"y'-coordinates of the point whose xy-coordinates are (= 2, 6). 


(b) Find the xy-coordinates of the point whose x’y'-coordinates are (2;:2). 


. Repeat Exercise 6 with @ = x / 3. 


Answer: 


(a) (—1+ 373, 3+ 73} 


© (5-¥3, 3¥3+1) 


. Let a rectangular x'y'z' coordinate system be obtained by rotating a rectangular xyz-coordinate system counterclockwise about the z-axis 


(looking down the z-axis) through the angle 4 = x } 4. 


(a) Find the x"y'z'-coordinates of the point whose xyz-coordinates are ( — 1, 2, 5). 


(b) Find the xyz-coordinates of the point whose x‘y'z'-coordinates are (1,6, —3). 


. Repeat Exercise 8 for a rotation of # = g / 3 counterclockwise about the y-axis (looking along the positive y-axis toward the origin). 


Answer: 

(a) f_1_5 5_ 1 
( 27 393.2 5-53 

(b) {1 _ 3 Shas! 
(3 5¥3,6, 2-2 3 


Repeat Exercise 8 for a rotation of § = 3m / 4 counterclockwise about the x-axis (looking along the positive x-axis toward the origin). 


(a) A rectangular x'y'z'-coordinate system is obtained by rotating an xyz-coordinate system counterclockwise about the y-axis through an 


angle 0 (looking along the positive y-axis toward the origin). Find a matrix A such that 
' 


x x 
! Zz 
Zz 


ror : pe : 
where (x, y, z) and (x > .2 ) are the coordinates of the same point in the xyz- and x'y'z!-systems, respectively. 


(b) Repeat part (a) for a rotation about the x-axis. 


Answer: 
(a) cos# 0 —sind 
A=| 0 1 0 
snf 0 cos@é 
(b) 1 0 0 
A=|0 cos# sind 
0 =sin@ cosé 


A rectangular x ""y"'z""-coordinate system is obtained by first rotating a rectangular xyz-coordinate system 60° counterclockwise about the 


z-axis (looking down the positive z-axis) to obtain an x'y'z' coordinate system, and then rotating the x'y'z'-coordinate system 45° 


counterclockwise about the y"-axis (looking along the positive y’-axis toward the origin). Find a matrix A such that 
xf - 
y"|=aly 
zi! Zz 


moon oe : ; 
where (x, y, z) and (x V2 ) are the xyz- and x" y"z"' coordinates of the same point. 


a+b b-a 
a-b b+a 


13. What conditions must a and 6 satisfy for the matrix 


to be orthogonal? 


Answer: 


a4+a7= 4 


14. Prove that a 2 x 2 orthogonal matrix A has only one of two possible forms: 


om cos# —siné a cos @ sin @ 
sn@ cosé sn@ —cos@ 


where 0 < @ < 2m. [Hint: Start with a general 2 x 2 matrix A= (33), and use the fact that the column vectors form an orthonormal set in 22.] 


15. (a) Use the result in Exercise 14 to prove that multiplication by a 2 x 2 orthogonal matrix is either a reflection or a reflection followed by a 
rotation about the x-axis. 


(b) Prove that multiplication by Ais a rotation if det(.4) = 1 and that a reflection followed by a rotation if det(4) = — 1. 


16. Use the result in Exercise 15 to determine whether multiplication by A is a reflection or a reflection followed by a rotation about the x-axis. 
Find the angle of rotation in either case. 


@u [2-32 
-| 
v2 yz 
® [4a B 
ge) 22 
3 1 
2 2 


17. Find a, b, and c for which the matrix 


1 1 
ee cy ee 
2 y2 
1 1 
boo 2% 
yo 6 
1 1 
a 
3 ¥3 
is orthogonal. Are the values of a, b, and c unique? Explain. 
Answer: 
2 1 2 1 
The onl ibiliti a=0, b= —-, c= —= 9 @=0, b=, c= - HH. 
e only possibilities are 'G 3 or (6 3 
18. The result in Exercise 15 has an analog for 3 » 3 orthogonal matrices: It can be proved that multiplication by a 3 5 3 orthogonal matrix A is a 
rotation about some axis if det(.4) = 1 and is a rotation about some axis followed by a reflection about some coordinate plane if det(4) = — 1 
. Determine whether multiplication by A is a rotation or a rotation followed by a reflection. 
(a) 
A=|= 


AP ala ~I[bo 
AA ~Ifo ~I|Ppo 
Pon) (enon | enon I Coad 


(b) 


ts 
ll 
AIA ~Lo [Po 
I 
I 
ado IN Ia 


~J|r ~~ ~I bo 


19. Use the fact stated in Exercise 18 and part (b) of Theorem 7.1.2 to show that a composition of rotations can always be accomplished by a single 


rotation about some appropriate axis. 
20. Prove the equivalence of statements (a) and (c) in Theorem 7.1.1. 


21. A linear operator on 2? is called rigid if it does not change the lengths of vectors, and it is called angle preserving if it does not change the 
angle between nonzero vectors. 


(a) Name two different types of linear operators that are rigid. 
(b) Name two different types of linear operators that are angle preserving. 


(c) Are there any linear operators on 22 that are rigid and not angle preserving? Angle preserving and not rigid? Justify your answer. 
Answer: 


(a) Rotations about the origin, reflections about any line through the origin, and any combination of these 
(b) Rotation about the origin, dilations, contractions, reflections about lines through the origin, and combinations of these 


(c) No; dilations and contractions 
True-False Exercises 


In parts (a)-(h) determine whether the statement is true or false, and justify your answer. 


(a) 1 0 
The matrix | 0 1 | is orthogonal. 
0 0 
Answer: 
False 
(b) The matrix E “1 is orthogonal. 
Answer: 
False 


(c) An y x » matrix A is orthogonal if 474 — j. 
Answer: 


False 


(d) A square matrix whose columns form an orthogonal set is orthogonal. 
Answer: 


False 


(e) Every orthogonal matrix is invertible. 
Answer: 


True 


(f) If A is an orthogonal matrix, then 42 is orthogonal and (det Ay? =), 


Answer: 


True 


(g) Every eigenvalue of an orthogonal matrix has absolute value 1. 


Answer: 


True 


(h) If 4 is a square matrix and ||_Au|| = 1 for all unit vectors u, then 4 is orthogonal. 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


7.2 Orthogonal Diagonalization 


In this section we will be concerned with the problem of diagonalizing a symmetric matrix A. As we will see, this problem is 
closely related to that of finding an orthonormal basis for 2” that consists of eigenvectors of A. Problems of this type are 
important because many of the matrices that arise in applications are symmetric. 


The Orthogonal Diagonalization Problem 


In Definition 1 of Section 5.2 we defined two square matrices, A and B, to be similar if there is an invertible matrix P such 
that P—!_4p — p. In this section we will be concerned with the special case in which it is possible to find an orthogonal 


matrix P for which this relationship holds. 


We begin with the following definition. 


DEFINITION 1 


If A and B are square matrices, then we say that A and B are orthogonally similar if there is an orthogonal matrix P 
such that Pp? 4p — p. 


If A is orthogonally similar to some diagonal matrix, say 
PTAP=D 
then we say that A is orthogonally diagonalizable and that P orthogonally diagonalizes A. 
Our first goal in this section is to determine what conditions a matrix must satisfy to be orthogonally diagonalizable. As a 


first step, observe that there is no hope of orthogonally diagonalizing a matrix that is not symmetric. To see why this is so, 
suppose that 


PT AP=D (1) 


where P is an orthogonal matrix and D is a diagonal matrix. Multiplying the left side of 1 by P, the right side by P?, and then 
using the fact that pp? — p? p — j, we can rewrite this equation as 


A=Pppt (2) 
Now transposing both sides of this equation and using the fact that a diagonal matrix is the same as its transpose we obtain 
T T 
AT = (PDP?) =(P?) p'p?=PpPT=A 


so A must be symmetric. 


Conditions for Orthogonal Diagonalizability 


The following theorem shows that every symmetric matrix is, in fact, orthogonally diagonalizable. In this theorem, and for 
the remainder of this section, orthogonal will mean orthogonal with respect to the Euclidean inner product on 2”. 


THEOREM 7.2.1 


If A is an » x » matrix, then the following are equivalent. 
(a) A is orthogonally diagonalizable. 
(b) A has an orthonormal set of 1 eigenvectors. 


(c) Ais symmetric. 


Proof 


(a) = (b) Since A is orthogonally diagonalizable, there is an orthogonal matrix P such that P—!_4p is diagonal. As shown in 


the proof of Theorem 5.2.1, the n column vectors of P are eigenvectors of A. Since P is orthogonal, these column vectors are 
orthonormal, so A has n orthonormal eigenvectors. 


(b) = (a) Assume that A has an orthonormal set of n eigenvectors {p1, p2, --., Py} - AS shown in the proof of Theorem 5.2.1, 
the matrix P with these eigenvectors as columns diagonalizes A. Since these eigenvectors are orthonormal, P is orthogonal 
and thus orthogonally diagonalizes A. 


(a) = (c) In the proof that (a) > (b) we showed that an orthogonally diagonalizable » x » matrix A is orthogonally 
diagonalized by an » x , matrix P whose columns form an orthonormal set of eigenvectors of A. Let D be the diagonal 
matrix 


D=P! AP 
from which it follows that 


A=pPpp? 
Thus, 


AT = (poPT)' =Pp"P? = port = A 


which shows that A is symmetric. 


(c) = (a) The proof of this part is beyond the scope of this text and will be omitted. 


Properties of Symmetric Matrices 


Our next goal is to devise a procedure for orthogonally diagonalizing a symmetric matrix, but before we can do so, we need 
the following critical theorem about eigenvalues and eigenvectors of symmetric matrices. 


THEOREM 7.2.2 


If A is a symmetric matrix, then: 
(a) The eigenvalues of A are all real numbers. 


(b) Eigenvectors from different eigenspaces are orthogonal. 


Part (a), which requires results about complex vector spaces, will be discussed in Section 7.5. 


Proof (b) Let ¥1 and V2 be eigenvectors corresponding to distinct eigenvalues Ay and Az of the matrix 4. We want to show 
that vy - vz = 0. Our proof of this involves the trick of starting with the expression Av - v3. It follows from Formula 26 of 
Section 3.2 and the symmetry of A that 


Av, -vo9=v,° Aly) =v] - Av? (3) 
But Vj is an eigenvector of A corresponding to Aj, and V2 is an eigenvector of A corresponding to A3, so 3 yields the 
relationship 


Ay¥y *¥2 = V1 ° Agv2 


which can be rewritten as 
(Ay —A2) (v1 + ¥2) = (4) 


But Ay — Az #0, since Ay and Az were assumed distinct. Thus, it follows from 4 that v; - vz = 0. 


Theorem 7.2.2 yields the following procedure for orthogonally diagonalizing a symmetric matrix. 


Orthogonally Diagonalizing an n x n Symmetric Matrix 


Step 1 Find a basis for each eigenspace of A. 
Step 2 Apply the Gram-Schmidt process to each of these bases to obtain an orthonormal basis for each eigenspace. 


Step 3 Form the matrix P whose columns are the vectors constructed in Step 2. This matrix will orthogonally 
diagonalize A, and the eigenvalues on the diagonal of 7; — p? 4p will be in the same order as their corresponding 
eigenvectors in P. 


Remark The justification of this procedure should be clear: Theorem 7.2.2 ensures that eigenvectors from different 
eigenspaces are orthogonal, and applying the Gram-Schmidt process ensures that the eigenvectors within the same 
eigenspace are orthonormal. It follows that the entire set of eigenvectors obtained by this procedure will be orthonormal. 


EXAMPLE 1 Orthogonally Diagonalizing a Symmetric Matrix 


Find an orthogonal matrix P that diagonalizes 


Nm BM 
Mm Mm 


Solution We leave it for you to verify that the characteristic equation of A is 
A=4 =-2 =2 
det(AZ — A) =det] —2 A—4 -—2 |= (imi (i= de) =0 
—2 -2 A=-4 


Thus, the distinct eigenvalues of A are \ — 2 and \ — 8. By the method used in Example 7 of Section 5.1, it 
can be shown that 


—1 —1 
uj=| llanduy=| 0 (5) 
0 1 


form a basis for the eigenspace corresponding to \ = 2. Applying the Gram-Schmidt process to {uj, uz} 
yields the following orthonormal eigenvectors (verify): 


a. 

V6 

al 

V6 (6) 
om 

V6 

The eigenspace corresponding to  — 9 has 


1 
u3=|] 1 
1 


as a basis. Applying the Gram-Schmidt process to {uz} (i-e., normalizing U3) yields 


cis 


= 
y3 
=e 
y3 


Finally, using ¥1, ¥2, and V3 as column vectors, we obtain 


1 


Spectral Decomposition 


If A is a symmetric matrix that is orthogonally diagonalized by 


P=[W Uz ... Uy] 
and if Ay, Az, ..., Ay, are the eigenvalues of A corresponding to the unit eigenvectors uj, U3, ..., Uy, then we know that 
D=p! ap, where D is a diagonal matrix with the eigenvalues in the diagonal positions. It follows from this that the matrix 
A can be expressed as 


Ay 0 0}; 4 
T 
A = popt=|y w.. w||/° % 0 |) 
0 60 \ 
n u, 
uy 
T 
= |Ayuy Agua ... Ayu, |} %2 
Un 
Multiplying out, we obtain the formula 
A=yujuy | Aguzuy -...4 Aue, (7) 


which is called a spectral decomposition of A.” 


Note that in each term of the spectral decomposition of A has the form yyy? where u is a unit eigenvector of 4 in column 
form, and 4 is an eigenvalue of A corresponding to u. Since u has size » x 1, it follows that the product yy! has size » x y. It 
can be proved (though we will not do it) that yy? is the standard matrix for the orthogonal projection of 2” on the subspace 


spanned by the vector u. Accepting this to be so, the spectral decomposition of A tells that the image of a vector x under 
multiplication by a symmetric matrix A can be obtained by projecting x orthogonally on the lines (one-dimensional 
subspaces) determined by the eigenvectors of A, then scaling those projections by the eigenvalues, and then adding the scaled 
projections. Here is an example. 


EXAMPLE 2 A Geometric Interpretation of a Spectral Decomposition << 


[2 2 


has eigenvalues Ay = — 3 and Aj = 2 with corresponding eigenvectors 


[anf 


(verify). Normalizing these basis vectors yields 


The matrix 


1. 

V5 

mal + 
5 


so a spectral decomposition of A is 


E 2 = yuyu; +Aguzuy = (= 3) 


Tay 
es | 

4 

_“~ 

i) 

— 
| ie | 
aly 
po 

al 
| 


=P 
SF oP 


a 
5 
2-2 = 
5 
(8) 

1.2 42 

5 5 a 5 

=(-31 9 4 [to 1 

3 5 5 5 


where, as noted above, the 2 x 2 matrices on the right side of 8 are the standard matrices for the orthogonal 
projections onto the eigenspaces corresponding to Aj = — 3 and Az = 2, respectively. 


Now let us see what this spectral decomposition tells us about the image of the vector x = (1, 1} under 
multiplication by A. Writing x in column form, it follows that 


[} ILE ° 


and from 8 that 


ae 4 2 
_f1 2yfay _ 5 ~5|f1 5 5 |[1 
Ac=|) li = (9) 2 4 }+@ 21 i] 
5 5 5 5 
ak 6 
_ af 5 5 
= (-3) 2 + (2) 3 (10) 
5 5 
¢ 
_ | 5 5 |_ [3 
~ |_6|*} 6 [0] 
5 


Formulas 9 and 10 provide two different ways of viewing the image of the vector (1, 1} under multiplication by 
A: Formula 9 tells us directly that the image of this vector is (3, 0}, whereas Formula 10 tells us that this image 


can also be obtained by projecting (1, 1) onto the eigenspaces corresponding to Ay = — 3 and Aj = 2 to obtain 
12 6 3 on 12 6 
the vectors ( 5° 2} and ( 575 } then scaling by the eigenvalues to obtain ( 5° = and ( 575 } and then 


adding these vectors (see Figure 7.2.1). 


Ax = (3, 0) 


Figure 7.2.1 


The Nondiagonalizable Case 


If A is an » x » matrix that is not orthogonally diagonalizable, it may still be possible to achieve considerable simplification 
in the form of P?_4P by choosing the orthogonal matrix P appropriately. We will consider two theorems (without proof) that 
illustrate this. The first, due to the German mathematician Isaai Schur, states that every square matrix A is orthogonally 
similar to an upper triangular matrix that has the eigenvalues of A on the main diagonal. 


THEOREM 7.2.3 Schur's Theorem 


If A is an » x » matrix with real entries and real eigenvalues, then there is an orthogonal matrix P such that P? 4p is 


an upper triangular matrix of the form 


Ay x x x 
0 Az x x 

PTAP=|0 0 Ag +++ x (11) 
060 «(0 An 


in which Ay, A3, ..., Ay are the eigenvalues of the matrix A repeated according to multiplicity. 


Issai Schur (1875-1941) 


Historical Note The life of the German mathematician Issai Schur is a sad reminder of the effect that Nazi policies 
had on Jewish intellectuals during the 1930s. Schur was a brilliant mathematician and a popular lecturer who 
attracted many students and researchers to the University of Berlin, where he worked and taught. His lectures 
sometimes attracted so many students that opera glasses were needed to see him from the back row. Schur's life 
became increasingly difficult under Nazi rule, and in April of 1933 he was forced to “retire” from the university 
under a law that prohibited non-Aryans from holding “civil service” positions. There was an outcry from many of his 
students and colleagues who respected and liked him, but it did not stave off his complete dismissal in 1935. Schur, 
who thought of himself as a loyal German never understood the persecution and humiliation he received at Nazi 
hands. He left Germany for Palestine in 1939, a broken man. Lacking in financial resources, he had to sell his 
beloved mathematics books and lived in poverty until his death in 1941. 

[Image: Courtesy Electronic Publishing Services, Inc., New York City] 


It is common to denote the upper triangular matrix in 11 by S (for Schur), in which case that equation can be rewritten as 


A=psp? (12) 
which is called a Schur decomposition of A. 
The next theorem, due to the German mathematician and engineer Karl Hessenberg (1904-1959), states that every square 


matrix with real entries is orthogonally similar to a matrix in which each entry below the first subdiagonal is zero (Figure 
7.2.2). Such a matrix is said to be in upper Hessenberg form. 


First subdiagonal 


Figure 7.2.2 


THEOREM 7.2.4 Hessenberg's Theorem 


If A is an » x¢ » matrix, then there is an orthogonal matrix P such that P? 4p is a matrix of the form 


K K tt" & K 
x KX 't*" &K K XK 
T O «x ‘sy x x Xx 
P°AP=|; 3 4, 5 ob (13) 
Oo. 60 x x Xx 
Oo 60 0 ™“ «x 


Note that unlike those in 11, the diagonal entries in 13 
are usually not the eigenvalues of A. 


It is common to denote the upper Hessenberg matrix in 13 by H (for Hessenberg), in which case that equation can be 
rewritten as 


A=PHp? (14) 


which is called an upper Hessenberg decomposition of A. 


Remark In many numerical algorithms the initial matrix is first converted to upper Hessenberg form to reduce the amount 
of computation in subsequent parts of the algorithm. Many computer packages have built-in commands for finding Schur and 
Hessenberg decompositions. 


Concept Review 


e Orthogonally similar matrices 


e Orthogonally diagonalizable matrix 

° Spectral decomposition (or eigenvalue decomposition) 
e Schur decomposition 

° Subdiagonal 

e Upper Hessenburg form 


e Upper Hessenburg decomposition 


Skills 

¢ Be able to recognize an orthogonally diagonalizable matrix. 

¢ Know that eigenvalues of symmetric matrices are real numbers. 

° Know that for a symmetric matrix eigenvectors from different eigenspaces are orthogonal. 
¢ Be able to orthogonally diagonalize a symmetric matrix. 


¢ Be able to find the spectral decomposition of a symmetric matrix. 


Know the statement of Schur's Theorem. 


Know the statement of Hessenburg's Theorem. 


Exercise Set 7.2 


1. Find the characteristic equation of the given symmetric matrix, and then by inspection determine the dimensions of the 
eigenspaces. 


(a) }1 2 
2 4 
(b)| 1 —4 2 
—4 1 =—2 
2 =—2 =—2 
(jt 14 
111 
111 
(d)|4 2 2 
24 2 
224 
(e) |}4 4 0 0 
4400 
000 0 
000 0 
(f) 2 <1 0 0 
—1 2 0 0 
0 2-1 
0 0 —1 2 
Answer: 


(a) \2—~5\—0: \—O: one-dimensional; \ — 5: one-dimensional 


(b) \37~27\—54 —0: \—6: one-dimensional; , — — 3: two-dimensional 


(c) \2~3\2—0: \—3: one-dimensional; ), — 0: two-dimensional 
(d) \7—~ 12,2 + 36\—32=0: A= 2: two-dimensional; \ — 8: one-dimensional 
(ec) \4~ 9,3 —0: \=—0: three-dimensional; \ — 9: one-dimensional 


aaa gx3 + 90,2 — 24 +9 =0: \=1- two-dimensional; \ — 3: two-dimensional 


In Exercises 2-9, find a matrix P that orthogonally diagonalizes A, and determine P —l gp. 


2 3 1 
‘A= 


3 6 273 
lays 7 
Answer: 


3 > 0 10 
(7 ¥7 
4 6 —2 
A= 
[2 3] 
5 —2 0 +36 
A= me 0 
=36.° 0 23 
Answer: 
4.9. 2 
5 5 25 0 0 
P=| 01 o|; P7aP=| 0-3 0 
294 0 O —50 
5 5 


8. 3100 
1300 
A=)5 0 0 0 
0000 
9 —?7 244 0 0 
24467 #0 0 
A | tp 2 ef OA 
0 0 24 7 
Answer: 
4.3 
—3 = 0 0 
24 94 5 00 30 0 
5. 25 4 0 25 0 O 
P= - PC AP= 
a 31 0 oO —25 0 
0 0 = 
5. 5 0 0 O 25 
3 4 
0 0 - 


10. Assuming that 4 » (), find a matrix that orthogonally diagonalizes 


ab 
ba 
11. Prove that if A is any j x » matrix, then 47 4 has an orthonormal set of n eigenvectors. 


12. (a) Show that if v is any »y % 1 matrix and /is the » x » identity matrix, then 7 — yy? is orthogonally diagonalizable. 


(b) Find a matrix P that orthogonally diagonalizes 7 — yy? if 


13. Use the result in Exercise 19 of Section 5.1 to prove Theorem 7.2.2a for 2 x 2 symmetric matrices. 


14, Does there exist a 3 x 3 symmetric matrix with eigenvalues 4y = — 1, Ag = 3, Az = 7 and corresponding eigenvectors 
0 1 0 
1}, 0], 1}? 
-1 0 1 


If so, find such a matrix; if not, explain why not. 


15. Is the converse of Theorem 7.2.25 true? Explain. 
Answer: 


No 


16. Find the spectral decomposition of each matrix. 


(by) [ 6 =2 
—2 3 
()|—-3 12 

1 =—3 2 


17. 
18. 


19. 


20. 


2 


— 


0 =—3 0 
—36 0 =—23 
Show that if A is a symmetric orthogonal matrix, then 1 and —] are the only possible eigenvalues. 
(a) Find a3 x 3 symmetric matrix whose eigenvalues are Ay = — 1, Ag = 3, Az = 7 and for which the corresponding 
eigenvectors are vj = (0, 1, — 1), v3 = (1, 0, 0), v3 = (0, 1, 1). 
(b) Is there a 3 x 3 symmetric matrix with eigenvalues Ay = — 1, Az = 3, Az = 7 and corresponding eigenvectors 
vy = (0,1, —1), vg = (1, 0, 0), vz = (1, 1, 1)? Explain your reasoning. 
Let A be a diagonalizable matrix with the property that eigenvectors from distinct eigenvalues are orthogonal. Must A be 


symmetric? Explain you reasoning. 

Answer: 

Yes 

Prove: If {uy, uz, .... Uy} is an orthonormal basis for 2”, and if A can be expressed as 
A= cuuy + equzu, -...4 Cyl, 


then A is symmetric and has eigenvalues ¢1, ¢3, ..., Cy. 


. In this exercise we will establish that a matrix A is orthogonally diagonalizable if and only if it is symmetric. We have 


shown that an orthogonally diagonalizable matrix is symmetric. The harder part is to prove that a symmetric matrix A is 

orthogonally diagonalizable. We will proceed in two steps: first we will show that A is diagonalizable, and then we will 

build on that result to show that A is orthogonally diagonalizable. 

(a) Assume that A is a symmetric »z x 4, matrix. One way to prove that A is diagonalizable is to show that for each 
eigenvalue Ag the geometric multiplicity is equal to the algebraic multiplicity. For this purpose, assume that the 
geometric multiplicity of Ag is k, let 35 = {uy, uz, ..., uz} be an orthonormal basis for the eigenspace corresponding 
to Ag, extend this to an orthonormal basis B= {uj, uz, ..., U,} for R”, and let P be the matrix having the vectors of 
Bas columns. As shown in Exercise 34(b) of Section 5.2, the product 4? can be written as 


AP =P Aplin 
0 i 


Use the fact that B is an orthonormal basis to prove that Y — [a zero matrix of size x» x (7 —*)]. 
(b) It follows from part (a) and Exercise 34(c) of Section 5.2 that A has the same characteristic polynomial as 
Aol, 0 
ca | tok 
0 F 


Use this fact and Exercise 34(d) of Section 5.2 to prove that the algebraic multiplicity of Ag is the same as the 
geometric multiplicity of Ag. This establishes that A is diagonalizable. 


(c) Use Theorem 7.2.2(5) and the fact that A is diagonalizable to prove that A is orthogonally diagonalizable. 


True-False Exercises 


In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 


(a) If A is a square matrix, then 4,47 and 47 4 are orthogonally diagonalizable. 


Answer: 


True 


(b) If ¥1 and ¥2 are eigenvectors from distinct eigenspaces of a symmetric matrix, then ||y; 4 v9|? = ||v; \| + ||v| 2 
Answer: 


True 


(c) Every orthogonal matrix is orthogonally diagonalizable. 
Answer: 


False 


(d) If A is both invertible and orthogonally diagonalizable, then 4—! is orthogonally diagonalizable. 
Answer: 


True 


(e) Every eigenvalue of an orthogonal matrix has absolute value 1. 
Answer: 


True 


(f) If A is an » x » orthogonally diagonalizable matrix, then there exists an orthonormal basis for 2” consisting of 
eigenvectors of A. 


Answer: 


False 


(g) If A is orthogonally diagonalizable, then A has real eigenvalues. 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


7.3 Quadratic Forms 


In this section we will use matrix methods to study real-valued functions of several variables in which each term is either the 
square of a variable or the product of two variables. Such functions arise in a variety of applications, including geometry, 
vibrations of mechanical systems, statistics, and electrical engineering. 


Definition of a Quadratic Form 


Expressions of the form 

Q{X, + agxXZ+ * + * HayXy 
occurred in our study of linear equations and linear systems. If @1, a3, ..., @,, are treated as fixed constants, then this expression 
is a real-valued function of the n variables x1, x3, ..., Xy and is called a linear form on 2”. All variables in a linear form occur 
to the first power and there are no products of variables. Here we will be concerned with quadratic forms on R”, which are 
functions of the form 


ax? + ax? + ...+ Ayxe + (al! possible terms aj,x;x jin which x; # xy] 


The terms of the form 4%% 3%; are called cross product terms. It is common to combine the cross product terms involving * 3% ; 
with those involving *j%; to avoid duplication. Thus, a general quadratic form on 22 would typically be expressed as 


ayx? -+ anx? ++ 243x 1X2 (1) 
and a general quadratic form on ? as 
ax? { ax? { a3x4 + 24x 1X2 + 2a5x1X3 + 2agx2XZ (2) 


If, as usual, we do not distinguish between the number a and the | x | matrix [a], and if we let x be the column vector of 
variables, then | and 2 can be expressed in matrix form as 


a, a3)[*1 T 
7213) alla [=E 
a, a4 a5 |[x1 
[x1 x2 x3]|a4 a2 a6 || x2| =x" Ax 
a5 46 43) XB 
(verify). Note that the matrix A in these formulas is symmetric, that its diagonal entries are the coefficients of the squared terms, 


and its off-diagonal entries are half the coefficients of the cross product terms. In general, if A is a symmetric 92 x 4, matrix and x 
is an » x | column vector of variables, then we call the function 


T 
Q Alx} =x! Ax (3) 
the quadratic form associated with A. When convenient, 3 can be expressed in dot product notation as 


x!) Ax =x- Ax=Ax-x (4) 


In the case where A is a diagonal matrix, the quadratic form x7 4x has no cross product terms; for example, if A has diagonal 


entries Aj, Ag, ..., Ay, then 


Ste J 


MO +++ 0 |x 


O Ag ++: O x 
x? ax=[x1x2°-° + xy]|) 2 ean | ie Ee eine ee 


0 0 +++ Ay|L% 


EXAMPLE 1 Expressing Quadratic Forms in Matrix Notation << 


In each part, express the quadratic form in the matrix notation x? 4, where A is symmetric. 
(a) 2x? + 6xy — Sy? 


(b) ae + axa - 3x? + 4x 1x9 — 2x 1x3 + Bx—x2 


Solution The diagonal entries of A are the coefficients of the squared terms, and the off-diagonal entries are half 
the coefficients of the cross product terms, so 


2 7a 2 3)[% 
2x* ++ 6xy — Sy -|* »|[5 P| 
12 =1/["1 


ae ft Tx — 3x3 bf 4x4x9— 2xyx3+ Bxgxz=[%1 %2 %3]] 2 F 41] %2 
—1 4 —3||%*3 


Change of Variable in a Quadratic Form 


There are three important kinds of problems that occur in applications of quadratic forms: 


Problem 1 If x? 4x is a quadratic form on 22 or 23, what kind of curve or surface is represented by the equation 

x? Ax =k? 

Problem 2 If x? 4 is a quadratic form on 8”, what conditions must A satisfy for x7 4 to have positive values for 
x#0? 

Problem 3 If x? 4y is a quadratic form on R”, what are its maximum and minimum values if x is constrained to satisfy 
I|x|]| = 1? 


We will consider the first two problems in this section and the third problem in the next section. 


Many of the techniques for solving these problems are based on simplifying the quadratic form x7 4x by making a substitution 


x= Py (5) 


that expresses the variables x1, x3, ..., X» in terms of new variables y;, v9, _.., y,,. If P is invertible, then we call 5 a change of 
variable, and if P is orthogonal, then we call 5 an orthogonal change of variable. 


If we make the change of variable x = Py in the quadratic form x7 4x, then we obtain 
T T TpT TipT 
x" Ax = (Py)"A(Py)=y"P7APy=y7(P7AP ly (6) 


Since the matrix 8 — P r AP is symmetric (verify), the effect of the change of variable is to produce a new quadratic form y By 
in the variables y, y3, ..., 7, In particular, if we choose P to orthogonally diagonalize A, then the new quadratic form will be 
y’ Dy, where D is a diagonal matrix with the eigenvalues of A on the main diagonal; that is, 


0 Az --: O lly¥2 
x’ Ax=y"Dy =[y1 y2 °° + yn] eo |? 
0 0 Ay || 7" 
=A? + Age ++ + +n? 


Thus, we have the following result, called the principal axes theorem. 


THEOREM 7.3.1. The Principal Axes Theorem 


If A is a symmetric » x » matrix, then there is an orthogonal change of variable that transforms the quadratic form x7 4x 
into a quadratic form y! Dy with no cross product terms. Specifically, if P orthogonally diagonalizes A, then making the 


change of variable x = Py in the quadratic form x7 4x yields the quadratic form 


2 
x7 Ax =y! Dy =\y? | dy? Fett tAnyy, 


in which Aj, Az, ..., Ay are the eigenvalues of A corresponding to the eigenvectors that form the successive columns of 
P, 


EXAMPLE 2 An llustration of the Principal Axes Theorem <@ 


Find an orthogonal change of variable that eliminates the cross product terms in the quadratic form 
o= x? a a — 4x 1x2 + 4x 2x3, and express Q in terms of the new variables. 


Solution The quadratic form can be expressed in matrix notation as 


1-2 O}f *1 
Q=x"éx=|x1 x2 x3/1-2 0 2]/ x2 
0 2 =—1),%3 
The characteristic equation of the matrix A is 
A=1 2 0 
2  d =_2 |=A7-9=A(AF 3)(A—3) =0 
0 =2 A+1 
so the eigenvalues are , — 0, —3, 3. We leave it for you to show that orthonormal bases for the three eigenspaces 
are 
2 ait 2 
3 3 3 
—o.|1 — —3.|—2 _ 2 
A=0:/ 3), A= 3: 3 | A=3 3 
2 2 1 
3 3 3 
Thus, a substitution x = Py that eliminates the cross product terms is 
21 2 
x 3 3 3 YI 
xaj=|i -2 2ily2 
x3 3 3 3 ¥3 
2 2 1 
3 3 3 


This produces the new quadratic form 


0 OO O};¥1 
Q=y" (PTAPly= [1 y2 ¥3]/0 -3 0||¥2|= —3y2 + 3y? 
0 O 3] ¥3 


in which there are no cross product terms. 


Remark IfA is a symmetric » x » matrix, then the quadratic form x7 4y is a real-valued function whose range is the set of all 
possible values for x7 4x as x varies over 2”. It can be shown that an orthogonal change of variable x = Py does not alter the 
range of a quadratic form; that is, the set of all values for x7 4y as x varies over 2” is the same as the set of all values for 


y"(P TAP ly as y varies over 2”. 


Quadratic Forms in Geometry 


Recall that a conic section or conic is a curve that results by cutting a double-napped cone with a plane (Figure 7.3.1). The most 
important conic sections are ellipses, hyperbolas, and parabolas, which result when the cutting plane does not pass through the 
vertex. Circles are special cases of ellipses that result when the cutting plane is perpendicular to the axis of symmetry of the 
cone. If the cutting plane passes through the vertex, then the resulting intersection is called a degenerate conic. The possibilities 
are a point, a pair of intersecting lines, or a single line. 


| lw @ I 
—— ae! | 


| Circle | Ellipse Parabola Hyperbola 


Figure 7.3.1 


y 


A central conic 
rotated out of 
standard position 


Figure 7.3.2 


Quadratic forms in 22 arise naturally in the study of conic sections. For example, it is shown in analytic geometry that an 


equation of the form 
2 2 _ 
ax” + 2bxy + cy* +dx+ey + f =0 (7) 


in which a, b, and c are not all zero, represents a conic section. If ¢—g— 0) in 7, then there are no linear terms, so the equation 
becomes 


ax? + 2bxy +cy? + f =0 (8) 


and is said to represent a central conic. These include circles, ellipses, and hyperbolas, but not parabolas. Furthermore, if } = 0 
in 8, then there is no cross product term (i.e., term involving xy), and the equation 


ax? + cy? + f =0 (9) 


is said to represent a central conic in standard position. The most important conics of this type are shown in Table 1. 


Table 1 


2 | 2 > > > a] > 


\2 Zo La Lo ae | Ze} 
a pe a? Bp’ a Bp’ B* a’ 
| (a>B>0) | (B>a>0) (a >0, B > 0) (a > 0, B >0) 
If we take the constant fin Equations 8 and 9 to the right side and let & = — ’, then we can rewrite these equations in matrix 
form as 
a b\[x a Olfx 
x y =k and [*% ¥ =k 
Ils 21>] 1/5 ol[>| (10 


The first of these corresponds to Equation 8 in which there is a cross product term 2bxy, and the second corresponds to Equation 
9 in which there is no cross product term. Geometrically, the existence of a cross product term signals that the graph of the 
quadratic form is rotated about the origin, as in Figure 7.3.2. The three-dimensional analogs of the equations in 10 are 


ad e|rx ao Ojfx 
[x vy z]]}d & f\lyl|=k and [* ¥ Z]]0 } Ol] ¥}=k (11) 
2 I c Zz 0 i) c Z 


If a, b, and ¢ are not all zero, then the graphs of these equations in R3 are called central quadrics in standard position. 


Identifying Conic Sections 


We are now ready to consider the first of the three problems posed earlier, identifying the curve or surface represented by an 
equation x7 4x — x in two or three variables. We will focus on the two-variable case. We noted above that an equation of the 


form 


ax? + 2bxy +cy*? + f =0 (12) 


represents a central conic. If } — Q, then the conic is in standard position, and if } x 0, it is rotated. It is an easy matter to 
identify central conics in standard position by matching the equation with one of the standard forms. For example, the equation 


9x? 4+ 16y? — 144 =0 


can be rewritten as 


Figure 7.3.3 


If a central conic is rotated out of standard position, then it can be identified by first rotating the coordinate axes to put it in 
standard position and then matching the resulting equation with one of the standard forms in Table 1. To find a rotation that 
eliminates the cross product term in the equation 


ax? + 2bxy +ey2 =k (13) 
it will be convenient to express the equation in the matrix form 


Tals alk ‘lp ]=* (14) 


and look for a change of variable 


x= Py’ 
that diagonalizes A and for which det(?)} = 1. Since we saw in Example 4 of Section 7.1 that the transition matrix 
A —sin@ 
p.|c°s 
snf cosé oy 


has the effect of rotating the xy-axes of a rectangular coordinate system through an angle 0, our problem reduces to finding @ that 
diagonalizes A, thereby eliminating the cross product term in 13. If we make this change of variable, then in the x'y'-coordinate 


a Ay 0 |}! 
x! m=| a al |=# (16) 


where A, and A3 are the eigenvalues of A.The conic can now be identified by writing 16 in the form 


system, Equation 14 will become 


Mx? + Ag =k (17) 


and performing the necessary algebra to match it with one of the standard forms in Table 1. For example, if Ay, Az, and & are 
positive, then 17 represents an ellipse with an axis of length 2 y & / dy in the x-direction and 2 y & / Az in the y'-direction. The 


first column vector of P, which is a unit eigenvector corresponding to A4, is along the positive x’-axis; and the second column 
vector of P, which is a unit eigenvector corresponding to A3, is a unit vector along the y’-axis. These are called the principal 


axes of the ellipse, which explains why Theorem 7.3.1 is called “the principal axes theorem.” (See Figure 7.3.4.) 


Unit eigenvector for A, 


(-sin 0, cos 8) cos §, sin #) 


Figure 7.3.4 


EXAMPLE 3 Identifying a Conic by Eliminating the Cross Product Term 


(a) Identify the conic whose equation is 5x7 — Axy + By? — 36 = 0 by rotating the xy-axes to put the conic in 
standard position. 
(b) Find the angle @ through which you rotated the xy-axes in part (a). 


Solution 
(a) The given equation can be written in the matrix form 
x! Ax = 36 
where 
5 =2 
A= 
The characteristic polynomial of A is 
A=5 2 
=(A=4)(A=-9 
| Dyn g(=A-Va-% 
so the eigenvalues are , — 4 and \ — 9. We leave it for you to show that orthonormal bases for the eigenspaces 
are 
2. spells 
{5 (5 
A=4 , A=Jo 
al. 2 
V5 


Thus, A is orthogonally diagonalized by 
2 
y5 
seal (18) 
5 


Had it turned out that det?) = — 1, then we 
would have interchanged the columns to reverse the 


sign. 


Moreover, it happens by chance that det(P) = 1, so we are assured that the substitution , — Px’ performs a 
rotation of axes. It follows from 16 that the equation of the conic in the x'y'-coordinate system is 


which we can write as 
12 #2 
4x!2 4. 9y!2 — 36 or 7+ a =1 


We can now see from Table | that the conic is an ellipse whose axis has length 24 — 6 in the x'-direction and 
length 24 = 4 in the y’-direction. 


(b) It follows from 15 that 


2 L 
P= (5 (5 =|5 pr 
1 2. snf cos@ 
{5 5 
which implies that 
tang = sin _ 3 


sue ar | 
call Ci aaa ry tea 


Thus, 9 = tan ws 26. 6° (Figure 7.3.5) 


Figure 7.3.5 


Remark In the exercises we will ask you to show that if } + O, then the cross product term in the equation 
2 2 
ax” + 2bxy +ey* =k 
can be eliminated by a rotation through an angle @ that satisfies 


a—c 
cot 24= ob (19) 


We leave it for you to confirm that this is consistent with part (b) of the last example. 


Positive Definite Quadratic Forms 


We will now consider the second of the two problems posed earlier, determining conditions under which x? 4x ~ 0 for all 
nonzero values of x. We will explain why this is important shortly, but first we introduce some terminology. 


The terminology in Definition | also applies to the 
matrix A; that is, A is positive definite, negative definite, 
or indefinite in accordance with whether the associated 
quadratic form has that property. 


DEFINITION 1 


A quadratic form x7 4x is said to be 
positive definite if x7 4x ~ 0 forx +0 
negative definite if x7 Ax = 0 forx +0 


indefinite if x7 4x has both positive and negative values 
x’ Ax p g 


The following theorem, whose proof is deferred to the end of the section, provides a way of using eigenvalues to determine 
whether a matrix A and its associated quadratic form x7 4x are positive definite, negative definite, or indefinite. 


THEOREM 7.3.2 


If A is a symmetric matrix, then: 


(a) x7 Ax is positive definite if and only if all eigenvalues of A are positive. 
(b) x7 Ax is negative definite if and only if all eigenvalues of A are negative. 


(c) x7 Ax is indefinite if and only if A has at least one positive eigenvalue and at least one negative eigenvalue. 


Remark The three classifications in Definition 1 do not exhaust all of the possibilities. For example, a quadratic form for 
which x) Ax > 0 ify 0 1s called positive semidefinite, and one for which x Ax < 0 ifx #0 is called negative semidefinite. 
Every positive definite form is positive semidefinite, but not conversely, and every negative definite form is negative 
semidefinite, but not conversely (why?). By adjusting the proof of Theorem 7.3.2 appropriately, one can prove that x7 4x is 
positive semidefinite if and only if all eigenvalues of A are nonnegative and is negative semidefinite if and only if all 
eigenvalues of A are nonpositive. 


EXAMPLE 4 Positive Definite Quadratic Forms << 


It is not usually possible to tell from the signs of the entries in a symmetric matrix A whether that matrix is 
positive definite, negative definite, or indefinite. For example, the entries of the matrix 


3 1 1 
A=|1 0 2 
12 0 


are nonnegative, but the matrix is indefinite since its eigenvalues are , — 1, 4, —2 (verify). To see this another 
way, let us write out the quadratic form as 
3 1 14/%1 
x Ax=|x1 x2 x3/11 0 2I/ x2 = 3x? f 2x 1X2 + 2x1x3 + 4x9x3 
1 2 O}| *3 


Positive definite and negative definite matrices 
are invertible. Why? 


We can now see, for example, that 
x? Ax=4 for xy,=0, x9=1, x3=1 
and 


x? Ax= —4 for xy=0, xp9=1, x3= <1 


Classifying Conic Sections Using Eigenvalues 


If x7 By — is the equation of a conic, and if { + 0, then we can divide through by & and rewrite the equation in the form 


x? Ax=1 (20) 


where A= (1 / &)8. If we now rotate the coordinate axes to eliminate the cross product term (if any) in this equation, then the 
equation of the conic in the new coordinate system will be of the form 


Mx? + Agy!*# =1 (21) 


in which A; and A3 are the eigenvalues of A. The particular type of conic represented by this equation will depend on the signs 
of the eigenvalues Ay and Az. For example, you should be able to see from 21 that: 


° x) Ax = 1 represents an ellipse if \y > 0 and Az > 0. 
© x7 Ax = 1 has no graph if Ay <Oand Az <0. 
° x7 4x = 1 represents a hyperbola if Ay and Az have opposite signs. 


In the case of the ellipse, Equation 21 can be rewritten as 


xf? yl? 


4A. 


es | 
so the axes of the ellipse have lengths 2 / yay and 2 } y Az (Figure 7.3.6). 


y 


Figure 7.3.6 


The following theorem is an immediate consequence of this discussion and Theorem 7.3.2. 


THEOREM 7.3.3 


If A is a symmetric 2 5¢ 2 matrix, then: 

(a) x" Ax — 1 represents an ellipse if A is positive definite. 
(b) x7 Ax — 1 has no graph if A is negative definite. 

(c) x7 Ax = 1 represents a hyperbola if A is indefinite. 


In Example we performed a rotation to show that the equation 
5x? — 4xy + 8y? — 36 =0 


represents an ellipse with a major axis of length 6 and a minor axis of length 4. This conclusion can also be obtained by 
rewriting the equation in the form 


St Ae a 
36% 9 + gy 1 
and showing that the associated matrix 
i 2 
36 18 
A= 
JI 2 
18 9 


has eigenvalues Ay = i and Az = Ae These eigenvalues are positive, so the matrix A is positive definite and the equation 
g 1=9 2=4 g Pp p q 


represents an ellipse. Moreover, it follows from 21 that the axes of the ellipse have lengths 2 ; rev =éfand?} y Az = 4, which 
is consistent with Example 3. 


Identifying Positive Definite Matrices 


Positive definite matrices are the most important symmetric matrices in applications, so it will be useful to learn a little more 
about them. We already know that a symmetric matrix is positive definite if and only if its eigenvalues are all positive; now we 
will give a criterion that can be used to determine whether a symmetric matrix is positive definite without finding the 
eigenvalues. For this purpose we define the kth principal submatrix of an » 5 % matrix A to be the & x ¢ submatrix consisting of 
the first k rows and columns of A. For example, here are the principal submatrices of a general 4 x 4 matrix: 


411 @12 413 @14 411 @12 413 @14 @11 412 413 @14 411 212 413 @14 
421 422 423 @24 421 @22 423 424 @21 @22 @23 424 421 422 423 424 
231 432 433 @34 431 432 433 &34 431 432 433 @34 231 432 433 @34 


a4, a@42 agz a a4, 242 agz a a4, 442 agz a a4, 442 a4gz aay 
= 


dd dd dd 
First principal submatrix) |Second principal submatrix] | Third principal submatrix] | Fourth principal submatrix 


The following theorem, which we state without proof, provides a determinant test for ascertaining whether a symmetric matrix is 
positive definite. 


THEOREM 7.3.4 


A symmetric matrix A is positive definite if and only if the determinant of every principal submatrix is positive. 


EXAMPLE 5 Working with Principal Submatrices << 


The matrix 


2-1 =3 
A=/-1 2 4 
—3 4 9 
is positive definite since the determinants 
2 <1 =3 
[2| =2, Ee 73-3 -1 2 4\/=1 
—3 4 9 


are all positive. Thus, we are guaranteed that all eigenvalues of A are positive and x7 4x ~ 0 for x #0. 


OPTIONAL 


We conclude this section with an optional proof of Theorem 7.3.2. 


Proofs of Theorem 7.3.2(a) and (b) It follows from the principal axes theorem (Theorem 7.3.1) that there is an orthogonal 
change of variable x = Py for which 


x! Ax =y7 Dy =A, y? + doy? + ...+ Awe (23) 


where the A's are the eigenvalues of A. Moreover, it follows from the invertibility of P that y # 0 if and only if x +Q, so the 
values of x7 4x for x + Q are the same as the values of y’ Dy for y #0. Thus, it follows from 23 that x? 4, ~ 0 for x x 0 if and 


only if all of the A's in that equation are positive, and that x7 4 — 0 for x « 0 if and only if all of the 2's are negative. This 
proves parts (a) and (5). 


Proof (c) Assume that A has at least one positive eigenvalue and at least one negative eigenvalue, and to be specific, suppose 
that Ay > 0 and Az < 0 in 23. Then 


x’ Ax>0 if y 1 =1 and all other y's are 0 
and 

x Ax >0 if y2=1 and all other y's are 0 
which proves that x7 4x is indefinite. Conversely, if x7 4 ~ 0 for some x, then y’ Dy > 0 for some y, so at least one of the A's 
in 23 must be positive. Similarly, if x7 4x — 0 for some x, then y) Dy < 0 for some y, so at least one of the A's in 23 must be 


negative, which completes the proof. 


Concept Review 

e Linear form 

© Quadratic form 

° Cross product term 

¢ Quadratic form associated with a matrix 
¢ Change of variable 

¢ Orthogonal change of variable 

e Principal Axes Theorem 


e Conic section 


e Degenerate conic 

° Central conic 

e Standard position of a central conic 

e Standard form of a central conic 

° Central quadric 

e Principal axes of an ellipse 

° Positive definite quadratic form 

e Negative definite quadratic form 

e Indefinite quadratic form 

¢ Positive semidefinite quadratic form 
e Negative semidefinite quadratic form 
e Principal submatrix 

Skills 

° Express a quadratic form in the matrix notation x7 4x, where A is a symmetric matrix. 


e Find an orthogonal change of variable that eliminates the cross product terms in a quadratic form, and express the 
quadratic form in terms of the new variable. 


e Identify a conic section from an equation by rotating axes to place the conic in standard position, and find the angle of 
rotation. 


e Identify a conic section using eigenvalues. 


¢ Classify matrices and quadratic forms as positive definite, negative definite, indefinite, positive semidefinite or 
negative semidefinite. 


Exercise Set 7.3 


In Exercises 1-2, express the quadratic form in the matrix notation x7 4x, where A is a symmetric matrix. 


1 (a) 3x2 47x? 
(b) 4x? - 9x7 — 6x1x2 


(c) 9x? —x4 + 4x3 + 6x 1x9 — 8x1x3-+ 9X3 


Answer: 


(c) 9 3 =—4 
-, 11/7! 
[x1 x2 x3] 2 || *2 
i *3 

—4 5 4 


2. (a) ax? + 5x 4x2 
(b) —?%1x2 


(c) ae + xa _ 3x2 — 5x1x2 + 9x1x3 


In Exercises 3-4, find a formula for the quadratic form that does not use matrices. 


‘PS SI 


Answer: 


2x? +4 5y? — 6xy 


4 7 
2 2 } x4 
[%1 %2 *3]) 7 0 6||72 
2 x3 
1 6 3 


In Exercises 5-8, find an orthogonal change of variables that eliminates the cross product terms in the quadratic form Q, and 
express Q in terms of the new variables. 


5.0= ox? + Ona — 2x1x2 
Answer: 
. _ 2,2 2 
x3 mk O=3y, + ¥2 


1 
ay] 2 Vp 
a4 


6.0= 5x? + 2x2 + 4x3 + 4x 1x9 
7.Q0= 3x? -++ 4x3 -+ Sze + 4x4xq—4x9x3 


Answer: 
ee 2s ob 
x 3 3 3 yq 
x2)= 2 ; é ¥2 |; Q=y? + 4y3 +793 
x 
3 1 2 =3 ¥3 
3 3 3 


8 O= Ox? + 5x3 -+ 5x? + 4x 1x9 — 4x 1x3 — 8x9x3 


In Exercises 9-10, express the quadratic equation in the matrix form x! Ax + Kx + j =0, where x Ax is the associated 


quadratic form and K is an appropriate matrix. 


9- (a) 2x7 +ay +x—6y +2=0 
(b) y*4+7x—8y —5=0 


Answer: 


(a) 2 


oS Mller 


[3 |+t-161[}]+2=0 


vals Se+0-sfs]-s< 


10. (a) x? xy + 5x 4+ 8y —3=0 
(b) 5x¥ =8 


In Exercises 11-12, identify the conic section represented by the equation. 


Mh. (a) 2x? 4. 5y? = 20 
(b) x27—y*-8=0 
(c) Ty? 2x =0 
(d) x74 7—25=0 


Answer: 


(a) ellipse 
(b) hyperbola 
(c) parabola 
(d) circle 
12.) 4x7 4.9? =1 
(b) 4x? — 5y? = 20 
(c) —x? = 2y 
(d) x27-3= -y? 


In Exercises 13-16, identify the conic section represented by the equation by rotating axes to place the conic in standard 
position. Find an equation of the conic in the rotated coordinates, and find the angle of rotation. 


13. 2x? —4xy —y?+8=0 
Answer: 
: 2 2 ° 
Hyperbola: 2(y")" — 3(x')" =8; Om —26.6 
14. 5x? 4. Axy + 5y? =9 
15. 11x? + 24xy + 4y? — 15=0 
Answer: 


Hyperbola: 4(x")? — (y’)? = 3; @ = 36.9° 


16. x? 4 xy tyes 


In Exercises 17-18, determine by inspection whether the matrix is positive definite, negative definite, indefinite, positive 
semidefinite, or negative semidefinite. 


17. (a) 10 
0 2 


(b) f—-1 0 
Cee 


(a) [1 0 
0.0 

(ec) }9 0 
0 —2 

Answer: 


(a) Positive definite 

(b) Negative definite 

(c) Indefinite 

(d) Positive semidefinite 


(e) Negative semidefinite 


18. (a) [2 0 
0 =5 
(b)|}—2 90 
0 =5 
(c) }2 0 
05 
(d)}9 OQ 
0 =5 
(e) }2 0 
0.0 
In Exercise 19-24, classify the quadratic form as positive definite, negative definite, indefinite, positive semidefinite, or 
negative semidefinite. 
19. xt | x3 
Answer: 


Positive definite 
20. =x? _ 3x2 
21. (x1 — x2)? 
Answer: 


Positive semidefinite 
22. —(x1 —x2) 
23. x2 — x4 

Answer: 


Indefinite 
24,*1%2 


In Exercises 25—26, show that the matrix A is positive definite first by using Theorem 7.3.2 and second by using Theorem 
7.3.4, 


25. 


(b) 3-1 0 
A=|=-1 2 =-1 
0-1 3 


In Exercises 27—28, find all values of k for which the quadratic form is positive definite. 
27. 5x2 x2 kee. _ _ 
»Sxp + x3 + kx3 + 4x 1x9 — 2x 1x3 — 2x9x3 


Answer: 


k>2 
2 2 2 
28. 3xf + x3 + 2x3 — 2x1X3 + 2kx9x3 


29. Let x7 4x be a quadratic form in the variables x1, x3, ..., Xy, and define 7: R” —. R by T(x] =x! Ax, 
(a) Show that T(x 4 y] = T(x} + 2x! Ay + T(y}. 
(b) Show that T {cx} — T(x] 
30. Express the quadratic form (¢4x4 + ¢9x%2 ++... ey%p) 2 in the matrix notation x7 4x, where A is symmetric. 
31. In statistics, the quantities 
x= ie XO... x] 


and 


$= 1-0)? + 2-17 ++ en] 


are called, respectively, the sample mean and sample variance of x = (x1, X32, ---. Xn). 


(a) Express the quadratic form s? in the matrix notation x7 4x, where A is symmetric. 


(b) Is st a positive definite quadratic form? Explain. 


Answer: 
(a) a pre | ete yest ee 
n(x —1) n(x — 1) 
Pahawons eon 1 pitas ate tok 
A=| n(n—1) n n(n—1) 
ee ee | 
n(x —1) n(x —1) 2 
(b) Yes 


32. The graph in an xyz-coordinate system of an equation of form ax? + by? 4ez* =1 in which a, b, and c are positive is a 


surface called a central ellipsoid in standard position (see the accompanying figure). This is the three-dimensional 
generalization of the ellipse ax? + by? = 1 in the xy-plane. The intersections of the ellipsoid ax? +4 by? + cz* = 1 with the 


33. 


34. 


35. 


coordinate axes determine three line segments called the axes of the ellipsoid. Ifa central ellipsoid is rotated about the origin 
so two or more of its axes do not coincide with any of the coordinate axes, then the resulting equation will have one or more 
cross product terms. 

(a) Show that the equation 


42,472,424 4.4 


a a ty2=1 
3 37 b 32 bQhy + 34Z + V2 

represents an ellipsoid, and find the lengths of its axes. [Suggestion: Write the equation in the form x7 4x — 1 and make 

an orthogonal change of variable to eliminate the cross product terms. 


(b) What property must a symmetric 3 5< 3 matrix have in order for the equation x7 4, — | to represent an ellipsoid? 


Figure Ex-32 


What property must a symmetric 2 » 2 matrix A have for x7 4, — | to represent a circle? 
Answer: 


A must have a positive eigenvalue of multiplicity 2. 
Prove: If } « 0, then the cross product term can be eliminated from the quadratic form ax? + 2hxy + cy? by rotating the 
coordinate axes through an angle @ that satisfies the equation 


cot 29 = 4=£ 


2b 
Prove that if A is an » sx 92 symmetric matrix all of whose eigenvalues are nonnegative, then x Ax > 0 for all nonzero x in 


R”. 


True-False Exercises 


In parts (a)-(1) determine whether the statement is true or false, and justify your answer. 


(a) A symmetric matrix with positive definite eigenvalues is positive definite. 


Answer: 


True 


(b) x? - x3 4 i + 4x1x2x3 is a quadratic form. 


Answer: 


False 


(c) (x, — 3x2)? is a quadratic form. 


Answer: 


True 


(d) A positive definite matrix is invertible. 


Answer: 


True 


(e) A symmetric matrix is either positive definite, negative definite, or indefinite. 
Answer: 


False 


(f) If A is positive definite, then —4 is negative definite. 
Answer: 


True 


(2) x - x is a quadratic form for all x in R”. 
Answer: 


True 


(h) If x7 4x is a positive definite quadratic form, then so is x7 4~ly. 
Answer: 


True 


(i) If A is a matrix with only positive eigenvalues, then x7 4x is a positive definite quadratic form. 
Answer: 


False 


(j) If A is a2 x% 2 symmetric matrix with positive entries and det(_4} > 0, then A is positive definite. 
Answer: 


True 


(k) If x7 4x is a quadratic form with no cross product terms, then A is a diagonal matrix. 
Answer: 


False 


(I) If x7 4x is a positive definitequadratic form in two variables and ¢ ¢ 0), then the graph of the equation x7 4 — ¢ is an ellipse. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


7.4 Optimization Using Quadratic Forms 


Quadratic forms arise in various problems in which the maximum or minimum value of some quantity is required. 
In this section we will discuss some problems of this type. 


Constrained Extremum Problems 


Our first goal in this section is to consider the problem of finding the maximum and minimum values of a 
quadratic form x7 4 subject of the constraint ||x|| = 1. Problems of this type arise in a wide variety of 


applications. 


To visualize this problem geometrically in the case where x7 4x is a quadratic form on 22, view z = x7 Ax as the 
equation of some surface in a rectangular xyz-coordinate system and view ||x|| = 1 as the unit circle centered at 
the origin of the xy-plane. Geometrically, the problem of finding the maximum and minimum values of x7 4x 
subject to the requirement ||x|] = 1 amounts to finding the highest and lowest points on the intersection of the 
surface with the right circular cylinder determined by the circle (Figure 7.4.1). 


z Constrained 
maximum 


Constrained 
minimum 


Unit circle 


Figure 7.4.1 


The following theorem, whose proof is deferred to the end of the section, is the key result for solving problems of 
this type. 


THEOREM 7.4.1 Constrained Extremum Theorem 


Let A be a symmetric » 5 % matrix whose eigenvalues in order of decreasing size are 

Ay 2 Ag > + + + Ay. Then: 

(a) the quadratic form x7 4x attains a maximum value and a minimum value on the set of vectors for 
which ||x|| = 1; 

(b) the maximum value attained in part (a) occurs at a unit vector corresponding to the eigenvalue Aj; 


(c) the minimum value attained in part (a) occurs at a unit vector corresponding to the eigenvalue A,). 


Remark The condition ||x|| = 1 in this theorem is called a constraint, and the maximum or minimum value of 


x! Ax subject to the constraint is called a constrained extremum. This constraint can also be expressed as 


x?x= ] or as 4 -+ x3 5 xe — |, when convenient. 


EXAMPLE 1 Finding Constrained Extrema 


Find the maximum and minimum values of the quadratic form 
z= 5x? 4 5y? b4xy 
subject to the constraint x? + y? = 1. 


Solution The quadratic form can be expressed in matrix notation as 
2 2 T 5 2|[% 
= 5x° + 5y*+4xy=x' Ax=|% Y 
Zz x vy t¥=xX | , A 7 


We leave it for you to show that the eigenvalues of A are \y = 7 and Az = 3 and that corresponding 


eigenvectors are 
1 =-1 
Ay =7: Aj = 3: 


Normalizing these eigenvectors yields 
=a =a 
j2 y2 

Ay =? 1h Az=3: “ (1) 
y2 


Thus, the constrained extrema are 


; 1 
constraned maximum:z = 7 at(x, vy) = |—=, +] 
yo $2 


Remark Since the negatives of the eigenvectors in 1 are also unit eigenvectors, they too produce the maximum 
and minimum values of z; that is, the constrained maximum z — 7 also occurs at the point 


1 1 ; ho 1 1 
(x, ¥) =| ——=, — —=| and the constrained minimum 7 — 2 at (x, ») = |—=, — =]. 
y2 ¥2 = (2 ¥2 


EXAMPLE 2 AConstrained Extremum Problem << 


A rectangle is to be inscribed in the ellipse 4x? 4. 9y? = 36, as shown in Figure 7.4.2.Use 


eigenvalue methods to find nonnegative values of x and y that produce the inscribed rectangle with 
maximum area. 


Figure 7.4.2 A rectangle inscribed in the ellipse Ax? + Qy? = 36. 


Solution The area z of the inscribed rectangle is given by z = 4xy, so the problem is to maximize 
the quadratic form z = 4x subject to the constraint 4x? +- 9y? = 36. In this problem, the graph of 


the constraint equation is an ellipse rather than the unit circle as required in Theorem 7.4.1, but we 
can remedy this problem by rewriting the constraint as 


By +G)= 
and defining new variables, *1 and y4, by the equations 
x=3x, and y=2y1 
This enables us to reformulate the problem as follows: 
maximize z= 4xy = 24x19 
subject to the constraint 
xt -+ y? =] 


To solve this problem, we will write the quadratic form z = 24x14 as 


a eee 0 12))*1 
z=x’ Ax ear olla 


We now leave it for you to show that the largest eigenvalue of A is ,} — 12 and that the only 
corresponding unit eigenvector with nonnegative entries is 


= 
fel 
y2 


Thus, the maximum area is z7 = ]2, and this occurs when 


cacti and y= dy =a 


Constrained Extrema and Level Curves 


A useful way of visualizing the behavior of a function # (x, yy) of two variables is to consider the curves in the 
xy-plane along which f(x, y) is constant. These curves have equations of the form 


F(x, y) =k 


and are called the /evel curves of f (Figure 7.4.3).In particular, the level curves of a quadratic form x7 4x on 2 


have equations of the form 
x Ax=k (2) 


so the maximum and minimum values of x7 4x subject to the constraint ||x|| = 1 are the largest and smallest 
values of k for which the graph of 2 intersects the unit circle. Typically, such values of k produce level curves that 
just touch the unit circle (Figure 7.4.4), and the coordinates of the points where the level curves just touch produce 
the vectors that maximize or minimize x7 4y subject to the constraint \|x|| = 1. 


x Level curve f(x, y)=k 


Figure 7.4.3 


Figure 7.4.4 


EXAMPLE 3 Example 1 Revisited Using Level Curves 


In Example 1 (and its following remark) we found the maximum and minimum values of the 
quadratic form 


z= 5x4 5y? b4xy 
subject to the constraint x? 4 y? = 1. We showed that the constrained maximum is z — 7, and this is 


attained at the points 


and that the constrained minimum z — 3, and this is attained at the points 


| 


«=( oe) wt r= [Fe 


Geometrically, this means that the level curve 5x? + 5y? + 4xy = 7 should just touch the unit 
circle at the points in 3, and the level curve 5x? 4 Sy? + 4xy = 3 should just touch it at the points 
in 4. All of this is consistent with Figure 7.4.5. 


y 


Yo _4 23 
| iy ae lak 


So 9 
5x* + Sy* + 4xy =3 


Figure 7.4.5 


CALCULUS REQUIRED 
Relative Extrema of Functions of Two Variables 


We will conclude this section by showing how quadratic forms can be used to study characteristics of real-valued 
functions of two variables. 


Recall that if a function fix, y) has first-order partial derivatives, then its relative maxima and minima, if any, 
occur at points where 


FT x(x, yy =0 and J y(%, y) =0 


These are called critical points off The specific behavior of fat a critical point (x9, yg) 18 determined by the sign 
of 


Dix, vy =F, vy) — FO Yo) (5) 


at points (x, y)} that are close to, but different from, (x9, yg): 


° If D(x, y) > 0 at points (x, y) that are sufficiently close to, but different from, (x9, yg), then 
f (xp, ¥o) <# (x, 3) at such points and fis said to have a relative minimum at (x9, yg) (Figure 7.4.6a). 


° If D(x, y) <0 at points (x, y) that are sufficiently close to, but different from, (x9, yg), then 
f (xp, ya) > f (x, y) at such points and fis said to have a relative maximum at (x9, yg) (Figure 7.4.6). 
° If D(x, y) has both positive and negative values inside every circle centered at (x9, yq), then there are points 


(x, y) that are arbitrarily close to (x9, yg) at which # (xg, yg) < f(x, y) and points (x, y) that are 
arbitrarily close to (xg, yg) at which # (xp, 39) > f(x, y)- In this case we say that fhas a saddle point at 


(xg, ¥g) (Figure 7.4.6c). 


CNT 
Wi I Wi 
NYY) 
Watt) 
Ny 


x 


| Relative minimum at (0, 0) 


(a) 


| Relative maximum at (0, 0) 


(b) 


x 


Saddle point at (0, 0) 


(c) 
Figure 7.4.6 


In general, it can be difficult to determine the sign of 5 directly. However, the following theorem, which is proved 
in calculus, makes it possible to analyze critical points using derivatives. 


THEOREM 7.4.2 Second Derivative Test 


Suppose that (xg, yg) 18 a critical point of f(x, y) and that fhas continuous second-order partial 
derivatives in some circular region centered at (xg, yg). Then: 


(a) fhas a relative minimum at (x9, yg) if 
Fxx(ao, YOS py (xo, Yo) -fiy (xa, Yo) >9 and fy,(xg, yo) >9 
(b) fhas a relative maximum at (x9, yg) if 
Fxx(%o, VOUS py (xo, Yo) —fiy (xo, Yo)>9 and fxx(xo, yo) <9 
(c) fhas a saddle point at (x9, yg) if 
Ff xx(x0, YO)F yy (x00) — Fx (XO. Yo) <0 
(d) The test is inconclusive if 


F ux(X0, YS yy(X0, YO) — Sty (0, 0) =0 


Our interest here is in showing how to reformulate this theorem using properties of symmetric matrices. For this 
purpose we consider the symmetric matrix 


HN ) F xx(%¥) F xy %¥) 
x = 
a F xy, ¥) F yy ¥) 


which is called the Hessian or Hessian matrix of fin honor of the German mathematician and scientist Ludwig 
Otto Hesse (1811-1874). The notation ¥(x, y) emphasizes that the entries in the matrix depend on x and y. The 
Hessian is of interest because 


F xx(%0, Yo) F xy (%0. ¥0) 
det] Fixg, yo) | = 


~ _ 72 
| Fxy(x0.¥0) F yy (x0. 0) =F xx(x0, YOS py (Xo. YO) — SF xy 0, YO) 


is the expression that appears in Theorem 7.4.2. We can now reformulate the second derivative test as follows. 


THEOREM 7.4.3. Hessian Form of the Second Derivative Test 


Suppose that (x9, yq) is a critical point of # (x, y) and that fhas continuous second-order partial 
derivatives in some circular region centered at (x9, yg). If Hix, yp) 1s the Hessian of fat (x9, yg), then: 


(a) fhas a relative minimum at (x9, yg) if H(xg, yg) iS positive definite. 
(b) fhas a relative maximum at (x9, yg) if H(xg, yg) 1s negative definite. 
(c) fhas a saddle point at (x9, yg) 1f H(xg, yg) 1S indefinite. 


(d) The test is inconclusive otherwise. 


We will prove part (a). The proofs of the remaining parts will be left as exercises. 


Proof (a) If Hix is positive definite, then Theorem 7.3.4 implies that the principal submatrices of 
(xo, ¥o) princip 
H (xg, yg) have positive determinants. Thus, 


F xx(%0, Yo) F xy (0, Yo) 2 
det[A(xa, yo) ] = Say(X. Yo) Sp7(X0, ¥0) =f xx(x0, YOS py (Xo, YO) — SF xy Oo, YO) > 9 


and 


det[ / xx(%0, Yo)] =F xx(x0, yo) > 0 
so fhas a relative minimum at (x9, yg) by part (a) of Theorem 7.4.2. 


EXAMPLE 4 Using the Hessian to Classify Relative Extrema << 


Find the critical points of the function 


f{xy)=9e° 4 xy? = Bxy 4+ 3 


and use the eigenvalues of the Hessian matrix at those points to determine which of them, if any, are 
relative maxima, relative minima, or saddle points. 


Solution To find both the critical points and the Hessian matrix we will need to calculate the first 
and second partial derivatives of f These derivatives are 


2 
Slt, yy =x? ty? = By, fy(z, y) =2xy— 8x, Sf ry(a, y) = 27-8 
SF xx(X, ¥) = 2x, F py, y)=2x 
Thus, the Hessian matrix is 
y SF xx(%, y) SF xy(%, ¥) 2x 2y=—8 
x, ¥ = = 
F xy ¥) F yy (x, ¥) ay—8 2x 
To find the critical points we set 7 , and 7 y equal to zero. This yields the equations 
f(x, yy =x? ty? —8y=0 and f(x, y) =2xy — 8x = 2x(y —4) =0 
Solving the second equation yields x = (Q or y = 4. Substituting x = Q in the first equation and 
solving for y yields y = 0) or y = 8; and substituting y; = 4 into the first equation and solving for x 
yields x = 4 or x = — 4. Thus, we have four critical points: 
(0,0), (0,8), (4,4), (=4,4) 


Evaluating the Hessian matrix at these points yields 


#0,0=| 8 =I H0,8)=[ i 


g 0 g 0 
H4,4) =|) ef (44) =| 3] 


We leave it for you to find the eigenvalues of these matrices and deduce the following classifications 
of the stationary points: 
Critical Point (x0, yo) | A1} 42 Classification 
Saddle point 
Saddle point 


OPTIONAL 


We conclude this section with an optional proof of Theorem 7.4.1. 


Proof of Theorem 7.4.1 The first step in the proof is to show that 4x has constrained maximum and minimum 
values for ||x|| = 1. Since A is symmetric, the principal axes theorem (Theorem 7.3.1) implies that there is an 
orthogonal change of variable x = /y such that 


x7 Ax =Ayy? + Agy? + + + + + Anyd (6) 


in which Ay, Az, ..., Ay are the eigenvalues of A. Let us assume that ||x|| = 1 and that the column vectors of P 
(which are unit eigenvectors of A) have been ordered so that 


Ay oAg2 °° BA, (7) 


Since the matrix P is orthogonal, multiplication by P is length preserving, so that ||¥|| = ||x|] = 1; that is, 
2 2 2 
ar a a 
It follows from this equation and 7 that 
2 2 2 - 2 2 2 
A=Anly} byete ee 4 yi | < Ay? tryst +s +rw? 
- 2 2 2 
= At [yj a ie ya J=At 
and hence from 6 that 
An <x Ax <dy 
This shows that all values of x7 4x for which ||x|| = 1 lie between the largest and smallest eigenvalues of A. Now 
let x be a unit eigenvector corresponding to Ay. Then 


x? Ax = x" Aix] = \yx?x = Ag |x|? =Ay 
which shows that x 74x has Aj as a constrained maximum and that this maximum occurs if x is a unit eigenvector 
of A corresponding to Aj. Similarly, if x is a unit eigenvector corresponding to A,, then 

x? Ax = x! (Ax) = Ayx? x = nllxl|? = Ay, 


so x7 4x has A,, as a constrained minimum and this minimum occurs if x is a unit eigenvector of A corresponding 


to Ay. This completes the proof. 


Concept Review 


Constraint 


Constrained extremum 


Level curve 


Critical point 


Relative minimum 


Relative maximum 


Saddle point 


Second derivative test 


Hessian matrix 


Skills 
e Find the maximum and minimum values of a quadratic form subject to a constraint. 


e Find the critical points of a real-valued function of two variables, and use the eigenvalues of the Hessian 
matrix at the critical points to classify them as relative maxima, relative minima, or saddle points. 


Exercise Set 7.4 


In Exercises 1-4, find the maximum and minimum values of the given quadratic form subject to the constraint 
x? | y? = 1, and determine the values of x and y at which the maximum and minimum occur. 


it. 5x2 —y? 
Answer: 


Maximum: 5 at (1, 0} and (= 1, 0}; minimum: —] at (0, 1) and (0, = 1) 


2.xy 
3. 3x7 4 Ty? 
Answer: 


Maximum: 7 at (0, 1) and (0, —1); minimum: 3 at (1, 0) and (—1, 0) 
4. 5x? 4 5xy 
In Exercises 5—6, find the maximum and minimum values of the given quadratic form subject to the constraint 
xi+y? +27 =1 


and determine the values of x, y, and z at which the maximum and minimum occur. 


5. 9x7 4 dy? 4 32? 


Answer: 


Maximum: 9 at (1, 0, 0) and (—1, 0, 0); minimum: 3 at (0, 0, 1) and (0, 0, —-1) 
6. 2x? 4 y? +27 + 2xy + 2xz 
7. Use the method of Example 2 to find the maximum and minimum values of xy subject to the constraint 
2 2 
4x* + 8y* = 16. 


Answer: 


Maximum: z — 4y2 at (x,y) = (2¥2, 2] and (- 2V2, — 2}; minimum: z — —4y/2 at 
(x,y) = (- 2V2, 2 and (2y2, _ 2\ 

8. Use the method of Example 2 to find the maximum and minimum values of x? xy + 2y? subject to the 
constraint x? + 3y? = 16. 

In Exercises 9-10, draw the unit circle and the level curves corresponding to the given quadratic form. Show that 


the unit circle intersects each of these curves in exactly two places, label the intersection points, and verify that 
the constrained extrema occur at those points. 


9. 5x2 =4" 
Answer: 
5x-y*= 
10. xy 


re (a) Show that the function / (x, y| =4xy- a" -y' has critical points at (0, 0), (1, 1), and (= 1, = 1). 


(b) Use the Hessian form of the second derivative test to show fhas relative maxima at (1, 1) and(—1, = 1) 
and a saddle point at (0, 0). 


a) Show that the function / (x J =x7- oxy —y * has critical points at (0, 0) and ( = 2, 2). 


(b) Use the Hessian form of the second derivative test to show fhas a relative maximum at ( = 2, 2) anda 
saddle point at (0, 0). 


In Exercises 10—13, find the critical points of f if any, and classify them as relative maxima, relative minima, or 
saddle points. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


f, y) =x —3xy-y? 
Answer: 


Critical points: (—1, 1), relative maximum; (0, 0), saddle point 


f {x.y} =x? — Bay t ye 

7 (x.9] =x? 4 2y? —x’y 

Answer: 

Critical points: (0, 0), relative minimum; (2, 1) and (—2, 1), saddle points 
f{xy}=x { y? = 3x —3y 


A rectangle whose center is at the origin and whose sides are parallel to the coordinate axes is to be inscribed 
in the ellipse x? 4 25y? = 25. Use the method of Example 2 to find nonnegative values of x and y that 


produce the inscribed rectangle with maximum area. 


Answer: 


Corner points: * = e = 7 


Suppose that the temperature at a point (x,y) on a metal plate is T(x Vv =4x? —dxy + y?, An ant, walking 


on the plate, traverses a circle of radius 5 centered at the origin. What are the highest and lowest temperatures 
encountered by the ant? 


(a) Show that the functions 
ig (x.7] =x"4y" and g(x,y] =x*-y* 
have a critical point at (0, 0) but the second derivative test is inconclusive at that point. 


(b) Give a reasonable argument to show that fhas a relative minimum at (0, 0) and g has a saddle point at (0, 
0). 


Suppose that the Hessian matrix of a certain quadratic form f (x, 7) is 


Ed 


What can you say about the location and classification of the critical points of f? 


Suppose that A is an » x », Symmetric matrix and 
g (x} =x! Ax 


where x is a vector in 8” that is expressed in column form. What can you say about the value of q if x is a unit 
eigenvector corresponding to an eigenvalue A of A? 


Answer: 


q(x) =A 


22. Prove: If x7_4x is a quadratic form whose minimum and maximum values subject to the constraint ||x|| = 1 


are m and M, respectively, then for each number c in the interval 7 < ¢ < AM, there is a unit vector X¢ such that 
x? Ax, =c- [Hint: In the case where yy < |, let Um and U jg be unit eigenvectors of A such that y! 4u,, =» 


and wu), Au iy — M, and let 
= Me—c c—m 
Xo = p Up», + >| —— 
: Mom ™ M—m ” 


Show that x? Ax, —-r.l 
True-False Exercises 
In parts (a)—(e) determine whether the statement is true or false, and justify your answer. 
(a) A quadratic form must have either a maximum or minimum value. 

Answer: 


False 


(b) The maximum value of a quadratic form x7 4x subject to the constraint ||xx|| = 1 occurs at a unit eigenvector 
corresponding to the largest eigenvalue of A. 


Answer: 


True 


(c) The Hessian matrix of a function fwith continuous second-order partial derivatives is a symmetric matrix. 
Answer: 


True 


(d) If (x9, 7g) 18 a critical point of a function fand the Hessian of fat (x9, yg) is 0, then fhas neither a relative 
maximum nor a relative minimum at (xg, ¥o)- 


Answer: 


False 


(e) If A is asymmetric matrix and det 4 < 0, then the minimum of x r Ax Subject to the constraint ||x|| = 1 is 


negative. 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


7.5 Hermitian, Unitary, and Normal Matrices 


We know that every real symmetric matrix is orthogonally diagonalizable and that the real symmetric matrices 
are the only orthogonally diagonalizable matrices. In this section we will consider the diagonalization problem 
for complex matrices. 


Hermitian and Unitary Matrices 


The transpose operation is less important for complex matrices than for real matrices. A more useful operation 
for complex matrices is given in the following definition. 


DEFINITION 1 


If A is a complex matrix, then the conjugate transpose of A, denoted by 4”, is defined by 


ree tal (1) 


ar —,T 
Remark Since part (b) of Theorem 5.3.2 states that (4 "| = (A) , the order in which the transpose and 


conjugation operations are performed in computing 4* — 4 T does not matter. Moreover, in the case where A 


T 


* = <i 7 
has real entries we have A = (A) = a so 4” is the same as 4? for real matrices. 


EXAMPLE 1 Conjugate Transpose 


Find the conjugate transpose 4” of the matrix 
a 1+: <i | : 
2 3=23 i 
Solution We have 


= 1=—: i 0 * = 
a-| > 340 a andhence A =A =|] j 3 a 


The following theorem, parts of which are given as exercises, shows that the basic algebraic 
properties of the conjugate transpose operation are similar to those of the transpose (compare to 
Theorem 1.4.8). 


THEOREM 7.5.1 


If k is a complex scalar, and if A, B, and C are complex matrices whose sizes are such that the stated 
operations can be performed, then: 


@ (4*) =A 

© (442) =a" 48" 
C (a—B) =4"—2" 
(@) (ia) = Ea" 

© (4B) =B"A" 


Remark Note that the relationship y - ¥ = ¥ Ty in Formula 5 of Section 5.3 can be expressed in terms of the 


conjugate transpose as 


u'v=vu (2) 


We are now ready to define two new classes of matrices that will be important in our study of diagonalization 


int” 


DEFINITION 2 


A square complex matrix A is said to be unitary if 
At=A" (3) 
and is said to be Hermitian © if 


A =A (4) 


Note that a unitary matrix can also be defined 


as a square complex matrix A for which 


AA =A A=i 


If A is areal matrix, then 4” — 47, in which case 3 becomes A! = 4? and 4 becomes A? = A. Thus, the 


unitary matrices are complex generalizations of the real orthogonal matrices and Hermitian matrices are 
complex generalizations of the real symmetric matrices. 


EXAMPLE 2 Recognizing Hermitian Matrices 


Hermitian matrices are easy to recognize because their diagonal entries are real (why?), and the 
entries that are symmetrically positioned across the main diagonal are complex conjugates. Thus, 
for example, we can tell by inspection that 


1 i 1+i 
A=| -? <5 2-3 
1—i 247 3 


is Hermitian. 


The fact that real symmetric matrices have real eigenvalues is a special case of the following more general 
result about Hermitian matrices, the proof of which is left for the exercises. 


THEOREM 7.5.2 


The eigenvalues of a Hermitian matrix are real numbers. 


The fact that eigenvectors from different eigenspaces of a real symmetric matrix are orthogonal is a special 
case of the following more general result about Hermitian matrices. 


THEOREM 7.5.3 


If A is a Hermitian matrix, then eigenvectors from different eigenspaces are orthogonal. 


Proof Let ¥1 and ¥2 be eigenvectors of A corresponding to distinct eigenvalues Aj and Az. Using Formula 2 
and the facts that Ay = Ay, Az = Az, and 4 = 4* we can write 


Ay (w2° ¥1) = (Aqv1) "v2 


(v1) *v9 = (v4" wa 


(vi Alw2 — v1 (Av2) 


v1 (Agv2) = Az (vi¥2| = Aj(¥2° v1) 


This implies that (Ay — Az) (2 - vy) = 0 and hence that v2 - vy = 0 (since Ay # AQ). 


EXAMPLE 3 Eigenvalues and Eigenvectors of a Hermitian Matrix << 


A= 2 1+i 
1—i 3 


has real eigenvalues and that eigenvectors from different eigenspaces are orthogonal. 


Confirm that the Hermitian matrix 


Solution The characteristic polynomial of A is 

A=2 =1=-i 
—l1+:i A=3 
(A= 2)(A=— 3) = (1 =) (-14+2) 


= (7-54 6)-2= (A= 1)(A—4) 


det(2— A) = 


so the eigenvalues of A are } — | and \ — 4, which are real. Bases for the eigenspaces of A can be obtair 


by solving the linear system 
A=2 =—1=—7}/%1] [0 
—1+i A—3{|[*%2] |0 


with , — | and with , — 4. We leave it for you to do this and to show that the general solutions of these 


systems are 
1 , 
re ae ee ed _ 4. | 41] J sa 
A= ‘|n|- 1 and A=4: [za |=! 


Thus, bases for these eigenspaces are 

1 ; 
=(1 

5 (1 +i) 


A= a=[7] and A=4: ¥3= 
1 


The vectors ¥1 and V2 are orthogonal since 
v2 °¥, =(—1—3) 5(1 +i))+()(1) =3(-1 =<) +1=0 


and hence all scalar multiples of them are also orthogonal. 


Unitary matrices are not usually easy to recognize by inspection. However, the following analog of Theorems 
7.1.1 and 7.1.3, part of which is proved in the exercises, provides a way of ascertaining whether a matrix is 


unitary without computing its inverse. 


THEOREM 7.5.4 


If A is an » sx » Matrix with complex entries, then the following are equivalent. 

(a) Ais unitary. 

(b) ||.Ax|| = ||x|| for all x inc”. 

(c) Ax: Ay=x-y forall x andy inc”. 

(d) The column vectors of A form an orthonormal set in ¢" with respect to the complex Euclidean 
inner product. 


(e) The row vectors of A form an orthonormal set in C’” with respect to the complex Euclidean inner 
product. 


EXAMPLE 4 AUnitary Matrix <4 


Use Theorem 7.5.4 to show that 


Mle pole 
——. 
— 
| 
~. i. 
ad 
holo 
ie ND 
| 
—s 
+ 
“ 
ed 


is unitary, and then find 47~!. 


Solution We will show that the row vectors 


n=[Z0+) F040] and 2=[Z0-9 4(-149) 


are orthonormal. The relevant computations are 


2 2 
jia+o +049 i+ 


llr ll 


llr1 || 


nom = (3(+4 d\(Ga-9) + (F042) 


(5 +))(30 +i)+ Ge +i) 


(-14 »] 


—H 
oe, 
Jo 


2 
jv ae eee Oe 
b(n »] 1i-Ji=0 


Since we now know that A is unitary, it follows that 


ihe 3(1-3) at Hi) 
Hh-) a) 


You can confirm the validity of this result by showing that 4,4" — 4° 4—/. 


Unitary Diagonalizability 


Since unitary matrices are the complex analogs of the real orthogonal matrices, the following definition is a 
natural generalization of orthogonal diagonalizability for real matrices. 


DEFINITION 3 


A square complex matrix is said to be unitarily diagonalizable if there is a unitary matrix P such that 
P* AP — Dis a complex diagonal matrix. Any such matrix P is said to unitarily diagonalize A. 


Recall that a real symmetric » x % matrix A has an orthonormal set of n eigenvectors and is orthogonally 
diagonalized by any » x » matrix whose column vectors are an orthonormal set of eigenvectors of A. Here is 
the complex analog of that result. 


THEOREM 7.5.5 


Every » x » Hermitian matrix A has an orthonormal set of eigenvectors and is unitarily diagonalized 
by any » x »% matrix P whose column vectors form an orthonormal set of eigenvectors of A. 


The procedure for unitarily diagonalizing a Hermitian matrix A is exactly the same as that for orthogonally 
diagonalizing a symmetric matrix: 


Unitarily Diagonalizing a Hermitian Matrix 


Step 1. Find a basis for each eigenspace of A. 


Step 2. Apply the Gram-Schmidt process to each of these bases to obtain orthonormal bases for the 
eigenspaces. 


Step 3. Form the matrix P whose column vectors are the basis vectors obtained in Step 2. This will 
be a unitary matrix (Theorem 7.5.4) and will unitarily diagonalize A. 


EXAMPLE 5_ Unitary Diagonalization of a Hermitian Matrix << 


Find a matrix P that unitarily diagonalizes the Hermitian matrix 


a 2 1+i 
1—i 3 


Solution We showed in Example 3 that the eigenvalues of A are \ — ] and \ — 4 and that bases 
for the corresponding eigenspaces are 


| i 
datn= [717] and A=4:¥2= 2 +H) 
1 


Since each eigenspace has only one basis vector, the Gram-Schmidt process is simply a matter of 
normalizing these basis vectors. We leave it for you to show that 


—|]—j 1 +3 

(3 : 'G 

P=] 1 | 4 P2= Teo =] 2 
(3 'G 


Thus, A is unitarily diagonalized by the matrix 


Although it is a little tedious, you may want to check this result by showing that 


=14} 1. -1-i 1+: 

‘ V3 v3 |r 2 4a] v3 ¥6 |_f1 0 

Cae | tag oO fe 3 | 1 2 =|; ‘| 
yo 6 ¥3 6 


Skew-Symmetric and Skew-Hermitian Matrices 


In Exercise 37 of Section 1.7 we defined a square matrix with real entries to be skew-symmetric if 47 — — 4. 


A skew-symmetric matrix must have zeros on the main diagonal (why?), and each entry off the main diagonal 


must be the negative of its mirror image about the main diagonal. Here is an example. 


0 1-2 
A=!|=1 0 4| [skew— symmetric] 
2-4 0 
We leave it for you to confirm that 47 — — 4. 
The complex analogs of the skew-symmetric matrices are the matrices for which 4" — — 4. Such matrices are 


said to be skew-Hermitian. 


Since a skew-Hermitian matrix A has the property 


it must be that A has zeros or pure imaginary numbers on the main diagonal (why?), and that the complex 
conjugate of each entry off the main diagonal is the negative of its mirror image about the main diagonal. Here 
is an example. 


i 1-7 5 
A=|-1-i 2: i| [skew— Hermitian] 
=5 i 0 


Normal Matrices 


Hermitian matrices enjoy many, but not all, of the properties of real symmetric matrices. For example, we 
know that real symmetric matrices are orthogonally diagonalizable and Hermitian matrices are unitarily 
diagonalizable. However, whereas the real symmetric matrices are the only orthogonally diagonalizable 
matrices, the Hermitian matrices do not constitute the entire class of unitarily diagonalizable complex matrices; 
that is, there exist unitarily diagonalizable matrices that are not Hermitian. Specifically, it can be proved that a 
square complex matrix A is unitarily diagonalizable if and only if 

AA" = AA 
Matrices with this property are said to be normal. Normal matrices include the Hermitian, skew-Hermitian, 
and unitary matrices in the complex case and the symmetric, skew-symmetric, and orthogonal matrices in the 
real case. The nonzero skew-symmetric matrices are particularly interesting because they are examples of real 
matrices that are not orthogonally diagonalizable but are unitarily diagonalizable. 


A Comparison of Eigenvalues 


We have seen that Hermitian matrices have real eigenvalues. In the exercises we will ask you to show that the 
eigenvalues of a skew-Hermitian matrix are either zero or purely imaginary (have real part of zero) and that the 
eigenvalues of unitary matrices have modulus 1. These ideas are illustrated schematically in Figure 7.5.1. 


¥ 
Pure imaginary 
eigenvalues 
(skew-Hermitian) 


|Al = 1 (unitary) 


Real eigenvalues 
(Hermitian) 


Figure 7.5.1 


Concept Review 

e Conjugate transpose 

e Unitary matrix 

e Hermitian matrix 

e Unitarily diagonalizable matrix 
e Skew-symmetric matrix 

e Skew-Hermitian matrix 


e Normal matrix 


Skills 

e Find the conjugate transpose of a matrix. 
¢ Be able to identify Hermitian matrices. 

e Find the inverse of a unitary matrix. 


e Find a unitary matrix that diagonalizes a Hermitian matrix. 


Exercise Set 7.5 


In Exercises 1-2, find 4*. 


1. a0Clil =i 
A=| 4 3 +3 
5 +3 0 
Answer: 


« [=-2 4 5-3 
a=| re 33 | 


2. 4_[2 1-2 -143 
45-% <i 


In Exercises 3-4, substitute numbers for the x's so that A is Hermitian. 


3. 1 i 2—33 
A=|x =3 1 
x ** 2 
Answer: 
1 i 2—33 
A=| = =3 1 
2+33 1 2 
4 2 0 3453 
A=|x —4 <i 
x x 6 


In Exercises 5—6, show that 4 is not Hermitian for any choice of the x's. 


5 (a) 1 ¢ 2=3 
A=| = <3 x 
2—-3% x x 
(b) x x 3453 
A= 0 i —i 
3—5 i x 
Answer: 


(a) 413 * 431 
(b) #22 * 422 


6. (a) 1 1+i x 
A=|1+i 7 x 
6—2i x 0 

(b) 1 x 345 


A= x 3 l=: 
3—53 x 243 


In Exercises 7-8, verify that the eigenvalues of the Hermitian matrix A are real and that eigenvectors from 
different eigenspaces are orthogonal (see Theorem 7.5.3). 


In Exercises 9-12, show that A is unitary, and find 4-1. 


9. 3 4, 
5 5 
A= 
_4 3, 
5. > 
Answer: 
a A 
* — 5 5 
A" =A += 
sds, 35 
5 5 
10. a 


ths 

I 

re 
oh 


I 
hol] 
—— 
—s 
4 
ou. 
ee” 
holo 
OO 
—_s 
ou 
ad 


Answer 
-i+ ¥3 1—iy3 
Ata at 2y2 2y2 
en ed eee eT 
2y2 2y2 
12 1 : 1 
: —(=$1+7) —(1=) 
We. & 


ae 2 


(3 V6 


In Exercises 13-18, find a unitary matrix P that diagonalizes the Hermitian matrix A, and determine P—! 4p. 


13. 4 1-3 
A= 
fee 5 


Answer: 


A= 0 1 
—2—33 —1 4 
20 0 0 3—53 
A=|x 0 —j 
x x 0 


In Exercises 21—22, show that A is not skew-Hermitian for any choice of the x's. 


21. (a) 0 i 2—3 
= —j 0 x 
2+3% x x 
(b) 1 x 3—=—53 
A= x 23 —j 
=—3453 i 33 
Answer: 


(a) 413% — 431 
(b) ayy —aij 


22. (a) i x 2=3 
A=| x 0 1 +i 
2+3i —-l-i «x 
(b) 0 —i 4+ 
A= * 0 x 
—4-7i x 1 


In Exercises 23-24, verify that the eigenvalues of the skew-Hermitian matrix A are pure imaginary numbers. 


*4=| 0 pa 


1+i i 
24 0 3 
“A= 
5 | 


In Exercises 25—26, show that A is normal. 


25. 142: 243 =2=i 
A=| 2+i 1+i <i 
=—2=-i =i 1+i 

26. 2+ 2i ? 1=3 

A= i —2a 1-33 

1—i 1-3) -—3+8 


27. Show that the matrix 


28. 


29. 


30. 


31. 


32. 
33. 
34. 


35. 


36. 
37. 


39. 


1) 18 
| 2 e 
A=—| , _ 
y2 Ee as 
is unitary for all real values of 0. [Note: See Formula 17 in Appendix B for the definition of g*”.] 


Prove that each entry on the main diagonal of a skew-Hermitian matrix is either zero or a pure imaginary 
number. 


Let A be any » x » matrix with complex entries, and define the matrices B and C to be 


ae | * =e = + 
B=4(4+4") and C= 3,[4 A") 


(a) Show that B and C are Hermitian. 
(b) Show that 4— B+ iC’ and 4* — Bic. 
(c) What condition must B and C satisfy for A to be normal? 


Answer: 


(c) B and C must commute. 


Show that if 4 is an 9; x 4, matrix with complex entries, and if u and v are vectors in C" that are expressed 
in column form, then 


Au:v=u-Av and u:-4v=Au-v 
Show that if A is a unitary matrix, then so is 4”. 
Show that the eigenvalues of a skew-Hermitian matrix are either zero or purely imaginary. 
Show that the eigenvalues of a unitary matrix have modulus 1. 


Show that if u is a nonzero vector in (*” that is expressed in column form, then P — yy” is Hermitian. 


Show that if u is a unit vector in (*” that is expressed in column form, then 7 — 7 — 2yy” is Hermitian and 
unitary. 
What can you say about the inverse of a matrix A that is both Hermitian and unitary? 


Find a 2 s 2 matrix that is both Hermitian and unitary and whose entries are not all real numbers. 
Answer: 
epee ae 
(2 y2 
i 1 


What geometric interpretations might you reasonably give to multiplication by the matrices P — yy” and 


H—/]—2uwu’ in Exercises 34 and 35? 


Answer: 


Multiplication of x by P corresponds to ||u|| 2 times the orthogonal projection of x onto #” = span {u} . If 


\|u|| = 1, then multiplications of x by 47 = J — 2uy" corresponds to reflection of x about the hyperplane y+ 


4} —1 * 
a0: Prove that if A is an invertible matrix, then 4” is invertible, and (4 = (4 *} ; 
41. (a) Prove that (A) = det(A). 
(b) Use the result in part (a) and the fact that a square matrix and its transpose have the same determinant 
“0 
to prove that det (4 = det(A). 


42. Use part (b) of Exercise 41 to prove: 

(a) If A is Hermitian, then det(A) is real. 

(b) If A is unitary, then |det(.4)| = 1. 
43. Use properties of the transpose and complex conjugate to prove parts (a) and (e) of Theorem 7.5.1. 
44. Use properties of the transpose and complex conjugate to prove parts (b) and (d) of Theorem 7.5.1. 


45. Prove that an » x », matrix with complex entries is unitary if and only if the columns of A form an 
orthonormal set in C™. 


46. Prove that the eigenvalues of a Hermitian matrix are real. 


True-False Exercises 


In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 


(a) The matrix : | is Hermitian. 
i 


Answer: 
False 
(b) ae ee ee ee 

y2 yo y3 

The matrix 0 F. 7 is unitary. 
(2 yo y3 

Answer: 

False 


(c) The conjugate transpose of a unitary matrix is unitary. 


Answer: 


True 


(d) Every unitarily diagonalizable matrix is Hermitian. 
Answer: 


False 


(e) A positive integer power of a skew-Hermitian matrix is skew-Hermitian. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


Chapter 7 Supplementary Exercises 


1. Verify that each matrix is orthogonal, and find its inverse. 


(a) |3 _4 
5 75 
4 3 
5 5 
(yf 4 3 
) 30 =3 
2 4 _12 
25 5 5 
123 16 
25 55 
Answer 
@f3  4)7 3a 
3 -5| | 55 
4 3] |_4 3 
5 5 5 5 
Of 4, 3) [ 4_9 2 
5 5 5 25 25 
9 4 12] _| 9 4 3 
25 5 25 5 5 
23 16] |_3 _12 16 
255 (5 5 25 25 


2. Prove: If Q is an orthogonal matrix, then each entry of Q is the same as its cofactor if det(@)) = 1 and is 
the negative of its cofactor if det(Q) = — 1. 


3. Prove that if A is a positive definite symmetric matrix, and if u and v vectors in 8” in column form, then 
{u, v}=u ? Ay 
is an inner product on 2”. 


4. Find the characteristic polynomial and the dimensions of the eigenspaces of the symmetric matrix 


3 2 2 
23 2 
22 3 
5. Find a matrix P that orthogonally diagonalizes 
1 4 
A=/0 1 0 
10 1 


and determine the diagonal matrix 7 — p7 4p. 


Answer: 


iol. 24% 

j2 y2 000 
P=| 0 Oo 1|; P74P=/0 2 0 

1 14 001 

f2 2 


6. Express each quadratic form in the matrix notation x7 4x. 
(a) —4x? | 16x? = 15x 1x2 
(b) Ox? —x3 + 4x? + 6x 1x2 — 8x1x3+2%9%3 
7. Classify the quadradic form 
a — 3x1x244 4x4 
as positive definite, negative definite, indefinite, positive semidefinite, or negative semidefinite. 


Answer: 


positive definite 


8. Find an orthogonal change of variable that eliminates the cross product terms in each quadratic form, and 
express the quadratic form in terms of the new variables. 


(a) —3x? { 5x3 + 2x 4x9 
(b) —5x? + he —x3 + 6x1x3+4x1x3 
9. Identify the type of conic section represented by each equation. 
(a) yox7=0 
(b) 3x =—11y7=0 


Answer: 


(a) parabola 
(b) parabola 


10. Find a unitary matrix U that diagonalizes 


Do + 
—_ — © 


and determine the diagonal matrix 9 — 77 —!47/. 


11. Show that if U is an »z x » unitary matrix and 


kil=Fal=*** =Exl=1 


then the product 


is also unitary. 
12. Suppose that 4*— — 4. 
(a) Show that 7A is Hermitian. 


(b) Show that A is unitarily diagonalizable and has pure imaginary eigenvalues. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


| CHAPTER Bi 


Linear Transformations 


CHAPTER CONTENTS 


8.1. General Linear Transformations 

8.2. Isomorphism 

8.3. Compositions and Inverse Transformations 
8.4. Matrices for General Linear Transformations 


8.5. Similarity 


INTRODUCTION 


In Section 4.9 and Section 4.10 we studied linear transformations from 2” to 8”. In this 
chapter we will define and study linear transformations from a general vector space V to a 
general vector space W. The results we obtain here have important applications in physics, 
engineering, and various branches of mathematics. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


8.1 General Linear Transformations 


Up to now our study of linear transformations has focused on transformations from 2” to R”". In this section we 
will turn our attention to linear transformations involving general vector spaces. We will illustrate ways in which 
such transformations arise, and we will establish a fundamental relationship between general n-dimensional vector 
spaces and 2”. 


Definitions and Terminology 


In Section 4.9 we defined a matrix transformation T ’" R” _, R™ to be a mapping of the form 
T a(x) = Ax 
in which A is an jz % % matrix. We subsequently established in Theorem 4.10.2 and Theorem 4.10.3 that the matrix 
transformations are precisely the linear transformations from R” to R™, that is, the transformations with the 
linearity properties 
Tiu+v)=T(u)+T7(v) and Tlku) =k7T(u) 


We will use these two properties as the starting point for defining more general linear transformations. 


DEFINITION 1 


If 7: ¥ —, }¥ is a function from a vector space V to a vector space W, then T is called a linear 
transformation from V to W if the following two properties hold for all vectors u and v in V and for all 
scalars k: 


(i) (ku) = kT(u) [Homogeneity property] 

(ii) 7(u+ ¥) = T(u) + Tv) [Additivity property] 

In the special case where /” — }f’, the linear transformation T is called a linear operator on the vector space 
Vz 


The homogeneity and additivity properties of a linear transformation 7’: /” —, Jf can be used in combination to 
show that if v1 and ¥2 are vectors in V and ky and x3 are any scalars, then 


Tkyvy + kov2) =k, T (vy) + oT (wa) 


More generally, if v1, ¥2, .... Vy are vectors in V and ky, &3, ..., ky are any scalars, then 


Plkyvy + kawg + ++ + + eywy) =k, Tw) + eo Oa) +--+ + 44,7 Oy,) (1) 


The following theorem is an analog of parts (a) and (d) of Theorem 4.9.1. 


THEOREM 8.1.1 


If 7: ¥ — }F is a linear transformation, then: 


(a) T(0} =0. 
(b) Tiu=—v) = T(u) = T(y) for all u and vin V. 


Proof Letu be any vector in V. Since Oy = Q, it follows from the homogeneity property in Definition | that 


T(0) = T(Ou) =0T(u) =0 


which proves (a). 


We can prove part (b) by rewriting 7(u — v) as 


Tiu-v) = Ttu+(—1)v) 
= Tiu+(-1T (yy) 
= T(u) —T(v) 


We leave it for you to justify each step. 


Use the two parts of Theorem 8.1.1 to prove that 
F(=-v)=<-v 


for all v in V. 
EXAMPLE 1 Matrix Transformations <4 
Because we have based the definition of a general linear transformation on the homogeneity and 


additivity properties of matrix transformations, it follows that a matrix transformation 7 A RY _, P™ is 
also a linear transformation in this more general sense with /” = 2” and ff” = R™. 


EXAMPLE 2 The Zero Transformation << 


Let V and W be any two vector spaces. The mapping 7: * —, }f such that 7(w) =0 for every vin Visa 
linear transformation called the zero transformation. To see that T is linear, observe that 


Fa+v)=0, Tiw=0, Tiv)=0, and T(iu)=0 
Therefore, 


T(u+v)=T(u)+7(v) and T(ku) =kT(u) 


EXAMPLE 3 The ldentity Operator 


Let V be any vector space. The mapping /: }” —, ¥ defined by /(v) = v is called the identity operator on 
V. We will leave it for you to verify that J is linear. 


EXAMPLE 4 Dilation and Contraction Operators << 


If V is a vector space and & is any scalar, then the mapping 7: ” —, / given by T(x) = xx is a linear 
operator on V, for if c is any scalar and if u and v are any vectors in V, then 


Tleu) = k(eu) =clku) =cT(u) 

Tiu+v) =k(u+v) =ku+ kv = Tu) + T(y) 
IfQ << 1, then Tis called the contraction of V with factor k, and if ¢ = 1, it is called the dilation of V 
with factor k (Figure 8.1.1). 


~ / kx “| x 


Dilation of V Contraction of V 


Figure 8.1.1 


EXAMPLE 5 ALinear Transformation from Pnpto Pnh+1 


Letp= p(x) =cg+eyx+s + + fuk be a polynomial in P,,, and define the transformation 
T: Py — Py by 

Tp] = T(p()] = xp(x) =cox + ox? se Cyxttl 
This transformation is linear because for any scalar k and any polynomials Pi and P2 in P,, we have 


Pep) = Tp (x)) =x kp (x) =k ape) = 47) 


and 


T( p(x) + p2(x)) =x(pr (a) + pala)) 
= xpy(x) +xp2(x) = T(p1) + T(p2) 


7 (py + pa) 


EXAMPLE 6 A.Linear Transformation Using an Inner Product 


Let V be an inner product space, let ¥g be any fixed vector in V, and let 7: —, # be the transformation 
T(x) = (x, vg} 


that maps a vector x into its inner product with vg. This transformation is linear, for if k is any scalar, and 
if u and v are any vectors in V, then it follows from properties of inner products that 


T (ku) = (ku, vo} =X(u, vo} =kT(u) 
T(u+ v) =(u+v, vo} =(u, vo} + (¥, vo} = Fu) + Flv) 


EXAMPLE 7 Transformations on Matrix Spaces 


Let AM, be the vector space of » 5 matrices. In each part determine whether the transformation is 
linear. 


(a) 7; (4) = At 
(b) F2(A) = det(A) 


Solution 
(a) It follows from parts (6) and (d) of Theorem 1.4.8 that 


7 (e4} = (kA)? =kAT =k, (4) 
T\(A B\=(A +B)? = AT 4 r= 7; (4 7; (8) 


so Ty is linear. 
(b) It follows from Formula 1 of Section 2.3 that 
T2 (kA) = det (kA) = kMdet (A) =k"T3 (A) 
Thus, 7°3 is not homogeneous and hence not linear if » =. 1. Note that additivity also fails 


because we showed in Example | of Section 2.3 that det(.4A + 3) and det(_A) + det(3) are not 
generally equal. 


EXAMPLE 8 Translation ls Not Linear << 


Part (a) of Theorem 8.1.1 states that a linear transformation maps 0 to 0. This property is useful for 
identifying transformations that are not linear. For example, if Xp is a fixed nonzero vector in 22, then 


the transformation 

T(x) =x+Xg 
has the geometric effect of translating each point x in a direction parallel to Xg through a distance of 
||xg|| (Figure 8.1.2). This cannot be a linear transformation since 7(0)}) = xg, so T does not map 0 to 0. 


Figure 8.1.2 T(x} =x + xg translates each point x along a line parallel to xg through a distance 
I|xoll. 


EXAMPLE 9 The Evaluation Transformation 


Let V be a subspace of #'{ = co, 00), let 
4 ey eee 


be distinct real numbers, and let 7: /” —. R” be the transformation 


PP) = OF 1), F%2), + Fn) (2) 


that associates with fthe n-tuple of function values at x1, X32, ..., X,. We call this the evaluation 
transformation on V at x1, X2, -.., Xy. Thus, for example, if 

xy=—1, x9=2, x3=4 
and if f (x) =x* —1, then 


TH) = OF @1), F 2), F (%3)) = ©, 3, 15) 


The evaluation transformation in 2 is linear, for if A is any scalar, and if fand g are any functions in V, 


then 
PAF) = (kA) 1), AS) (2)... AAG) 
= (Kf (x1), RF (x2)... 4 Fn) 
= KF (x1), £2)...» Fn) HRT) 
and 


TF +g) = (Ff +2)01), F +8)(%2),--. F +2) Gn) 

CF (x1) + 071), f (x2) + 2042), --+ Fn) + Bn) 
CF (1), £42), £ On) + (er), B(%2), --. Bn) 
= Tf) +7T(s) 


Finding Linear Transformations from Images of Basis Vectors 


We saw in Formula (12) of Section 4.9 that if 7.” — 2” is a matrix transformation, say multiplication by A, and 
if e 1, €3, ..., @» are the standard basis vectors for R”, then A can be expressed as 


A= [T(e1)|7(e2)|° + + |Flen)] 
It follows from this that the image of any vector v = (c4, ¢2, ....¢y) in 2” under multiplication by A can be 
expressed as 
T(v) =ciT(e1) +c27(e2) + ° + + +c,7 (ey) 
This formula tells us that for a matrix transformation the image of any vector is expressible as a linear combination 
of the images of the standard basis vectors. This is a special case of the following more general result. 


THEOREM 8.1.2 


Let 7: 7 — }¥ be a linear transformation, where V is finite dimensional. If S= {wv 1, v3, ..., ¥,} is a basis 


for V, then the image of any vector v in V can be expressed as 
P(w) =e, T (v1) +27 (va) + + + + bent (yy) (3) 


where ¢1, €2, ..., Cy are the coefficients required to express v as a linear combination of the vectors in S. 


Proof Express v as ¥=c¢jVj + ¢2¥3- * * * ++ €yV¥y and use the linearity of T. 


EXAMPLE 10 Computing with Images of Basis Vectors << 


Consider the basis S= {¥1, ¥2, ¥3} for 7, where 
y=,1,1), vwa=0,1,0), v3=(1,0, 0) 
Let 7: R7 _, R2 be the linear transformation for which 
Paiy=(01,9), Twg=C,—-1), Ths) =, 3) 
Find a formula for 7(x1, x2, x3), and then use that formula to compute 7(2, — 3, 5). 


Solution We first need to express x = (x1, X2, X3) as a linear combination of ¥1, ¥2, and ¥3. If we 
write 


(x1, x2, x3) =c1(1, 1, 1) +e9(1, 1, 0) +301, 0, 0) 


then on equating corresponding components, we obtain 


eyteg+tez, = X 
ci #2 = x32 
cy = x3 


which yields ¢1 = *3, ¢3 = %2—%3,¢3 =X1 — X32, so 


(x1,%2,%3) = x3(1, 1,1) + &2—23)(1, 1, 0) + (1 — x2), 9, 9) 
= x3¥) + (x2 —2%3)v2 + (x1 — 2X2) V3 
Thus 
2(x1,%2,%3) = x300v1) + (42-243) T (wa) + (x1 — x2) TOv3) 


x3(1, 0) + (x2 — x3) (2, — 1) + x1 — 22) 4, 3) 
(4x, — 2x9 — x3, 3x1 — 4x9 423) 


From this formula, we obtain 
F(2, = 3,5) = (9, 23) 
CALCULUS REQUIRED 


EXAMPLE 11 A Linear Transformation from Cl(-~, 0) to F(-~, ~) < 


1 i F ‘ : Boy 
Let ¥=C (- oo, 00) be the vector space of functions with continuous first derivatives on ( =o, 00), and let 


WW” = # ( — co, 00) be the vector space of all real-valued functions defined on ( — oo, 00). Let 9: —, J be the 
transformation that maps a function f = f (x) into its derivative—that is, 


D(f) =f") 
From the properties of differentiation, we have 
DE +g) =D(kf) =kD) and Df) + D(g) 


Thus, D is a linear transformation. 


CALCULUS REQUIRED 


EXAMPLE 12 An Integral Transformation 


Let ¥ = C’( =, 00) be the vector space of continuous functions on the interval ( = co, oa), let 


1 ; . : boos 
W=C ( — CO, 00) be the vector space of functions with continuous first derivatives on ( = co, 00), and 


let 7: —, }¥ be the transformation that maps a function fin V into 
x 
i)= fs @ae 


For example, if # (x) = x, then 


x co 
r= [ PaaS 2 
Q 3 3 
0 
The transformation ,j: }” —, }# is linear, for if k is any constant, and if fand g are any functions in V, then 
properties of the integral imply that 


JRPI= | RP Odt=k] fe)dt=kIF) 
0 0 


JF +Q=]) SO+a@)at= - fE)at+ J gejdt=JSYP) + J) 
0 0 0 


Kernel and Range 


Recall that if A is an jz 5 % matrix, then the null space of A consists of all vectors x in ” such that 4x — Q, and by 
Theorem 4.7.1 the column space of A consists of all vectors b in ™ for which there is at least one vector x in R” 
such that 4x — h. From the viewpoint of matrix transformations, the null space of A consists of all vectors in 2” 
that multiplication by A maps into 0, and the column space of A consists of all vectors in R” that are images of at 
least one vector in 2” under multiplication by A. The following definition extends these ideas to general linear 
transformations. 


DEFINITION 2 


If 7: ¥ —, }f is a linear transformation, then the set of vectors in V that T maps into 0 is called the kernel of 
T and is denoted by ker(#}. The set of all vectors in W that are images under T of at least one vector in V is 
called the range of T and is denoted by R(¢). 


EXAMPLE 13 Kernel and Range of a Matrix Transformation 


If T 4.2" — R™ is multiplication by the jy x » matrix A, then, as discussed above, the kernel of T 4 is 
the null space of A, and the range of 7’ 4 is the column space of A. 


EXAMPLE 14 Kernel and Range of the Zero Transformation 


Let 7: ¥ — }¥ be the zero transformation. Since T maps every vector in V into 0, it follows that 
ker(¢) =. Moreover, since 0 is the only image under T of vectors in V, it follows that R(t) = {0}. 


EXAMPLE 15 Kernel and Range of the Identity Operator << 


Let j:¥ —, be the identity operator. Since /(v) = v for all vectors in V, every vector in V is the image 
of some vector (namely, itself); thus R(/) = F’. Since the only vector that J maps into 0 is 0, it follows 
that ker(/) = {0}. 


EXAMPLE 16 Kernel and Range of an Orthogonal Projection 


As illustrated in Figure 8.1.3a, the points that T maps into 0 = (0, 0, 0) are precisely those on the z-axis, 
so ker(£) is the set of points of the form (0, 0, z). As illustrated in Figure 8.1.35, T maps the points in 27 
to the xy-plane, where each point in that plane is the image of each point on the vertical line above it. 
Thus, X(£) is the set of points of the form (x, y, 0). 


(0, 0, 0) 


(a) ker(T) is the z-axis. (5) R(T) is the entire xy-plane. 


Figure 8.1.3 


EXAMPLE 17 Kernel and Range of aRotation <@ 


Let 7: R2 _, R2 be the linear operator that rotates each vector in the xy-plane through the angle g (Figure 
8.1.4). Since every vector in the xy-plane can be obtained by rotating some vector through the angle @, it 
follows that R(#) = R?. Moreover, the only vector that rotates into 0 is 0, so ker(#) = {0}. 


Figure 8.1.4 


CALCULUS REQUIRED 


EXAMPLE 18 Kernel of a Differentiation Transformation << 


Let ¥=C (- om, 00] be the vector space of functions with continuous first derivatives on ( — co, 00), 


let #” = #'( — 0, 00) be the vector space of all real-valued functions defined on ( = o0, 00), and let 
D:V — Wf be the differentiation transformation D (f } = f'(x). The kernel of D is the set of functions in 


V with derivative zero. From calculus, this is the set of constant functions on ( = 00, 00). 


Properties of Kernel and Range 


In all of the preceding examples, ker(¢) and X(£) turned out to be subspaces. In Example 14, Example 15, and 
Example 17 they were either the zero subspace or the entire vector space. In Example 16 the kernel was a line 
through the origin, and the range was a plane through the origin, both of which are subspaces of 27. All of this is a 


consequence of the following general theorem. 


THEOREM 8.1.3 


If 7: ¥ — }f is a linear transformation, then: 
(a) The kernel of Tis a subspace of V. 
(b) The range of T is a subspace of W. 


Proof (a) To show that ker(¢} is a subspace, we must show that it contains at least one vector and is closed under 
addition and scalar multiplication. By part (a) of Theorem 8.1.1, the vector 0 is in ker(#}, so the kernel contains at 
least one vector. Let Vj and ¥32 be vectors in ker(#}, and let & be any scalar. Then 


Fv + v2) = T(v1) + T(v2) =0 +0=0 
SO ¥1 + ¥2 is in ker(#}. Also, 


T(kw1) =kT (v1) =k0 =0 


so kw, is in ker(£). 


Proof (b) To show that R(t) is a subspace of W, we must show that it contains at least one vector and is closed 
under addition and scalar multiplication. However, it contains at least the zero vector of W since 7(0} = (0) by 
part (a) of Theorem 8.1.1. To prove that it is closed under addition and scalar multiplication, we must show that if 
Wy and ‘3 are vectors in X(£), and if & is any scalar, then there exist vectors a and b in V for which 


T(a) =w, +w2 and T(h)=hw; (4) 


But the fact 1 and W3 are in X(£) tells us that there exist vectors V1 and V3 in V such that 
Twj)=w, and T(v2) =w2 


The following computations complete the proof by showing that the vectors a= v1 + Vz and h = év, satisfy the 
equations in 4: 


F(a) = T(vy + v9) = Tv) + Tova) = wy + wo 
T(b) = Tay) =£T (v1) = dy 


CALCULUS REQUIRED 


EXAMPLE 19 Application to Differential Equations <@ 


Differential equations of the form 
y" 4 wy =0 (w a positive constant (5) 


arise in the study of vibrations. The set of all solutions of this equation on the interval ( — co, a0) is the 


kernel of the linear transformation 2: Co ( — CO, 00} — c( — Co, 00), given by 


Dy) =" +w*y 
It is proved in standard textbooks on differential equations that the kernel is a two-dimensional subspace 
of ig ( — co, 00), so that if we can find two linearly independent solutions of 5, then all other solutions 


can be expressed as linear combinations of those two. We leave it for you to confirm by differentiating 
that 


yy=coswx and yo=sinwx 
are solutions of 5. These functions are linearly independent since neither is a scalar multiple of the other, 
and thus 


y =cjCos wx + cgsin wx (6) 


is a “general solution” of 5 in the sense that every choice of ¢; and ¢2 produces a solution, and every 
solution is of this form. 


Rank and Nullity of Linear Transformations 


In Definition 1 of Section 4.8 we defined the notions of rank and nullity for an j7; x » matrix, and in Theorem 4.8.2, 
which we called the Dimension Theorem, we proved that the sum of the rank and nullity is 7. We will show next 
that this result is a special case of a more general result about linear transformations. We start with the following 
definition. 


DEFINITION 3 


Let 7: —, }¥ be a linear transformation. If the range of T is finite-dimensional, then its dimension is called 
the rank of T; and if the kernel of T is finite-dimensional, then its dimension is called the nullity of T. The 
rank of T is denoted by rank(#} and the nullity of T by nullity(z). 


The following theorem, whose proof is optional, generalizes Theorem 4.8.2. 


THEOREM 8.1.4 Dimension Theorem for Linear Transformations 


If 7: ¥ —, }F¥ is a linear transformation from an n-dimensional vector space V to a vector space W, then 


rank (£) ++ nullity(4) = x (7) 


In the special case where A is an » x % matrix and 7 4:2" — R™ is multiplication by A, the kernel of T 4 is the null 
space of A, and the range of 7’ 4 is the column space of A. Thus, it follows from Theorem 8.1.4 that 
rank (7 4) + nullity(7 4) =» 
OPTIONAL 


Proof of Theorem 8.1.4 We must show that 


dim(A(£)) ++ dim(ker(t)) =» 
We will give the proof for the case where 1 < dim(ker(£}) < ». The cases where dim(ker(¢))) = 0 and 
dim(ker(#)) = » are left as exercises. Assume dim(ker(#)) = 7, and let vy, .... v» be a basis for the kernel. Since 
{v1,..., Vy} is linearly independent, Theorem 4.5.55 states that there are » — » vectors, ¥)4.1, ..., Vy, Such that the 
extended set {V1, .... Vy, Vy41,---, ¥y} is a basis for V. To complete the proof, we will show that the » — » vectors 
in the set S= {7(v,41),..., 7(v»)} form a basis for the range of T. It will then follow that 
dim(R(£}) + dim(ker(£)) = (2 = 7) +r =x 


First we show that S' spans the range of T. If b is any vector in the range of 7, then b = 7’(w) for some vector v in 
V. Since {¥4, ..., Vy, Vy-41,--- Vy} 1S a basis for V, the vector v can be written in the form 
VSCyyy ttt HC pVy Hyg Vy 4 FH enVy 


Since vj, ..., ¥» lie in the kernel of 7, we have T(vj} = - - - = T{w,) =0, so 


b= T7(v) =cygi Twp gi) +t HenT (vy) 
Thus S spans the range of 7. 


Finally, we show that S is a linearly independent set and consequently forms a basis for the range of T. Suppose that 
some linear combination of the vectors in S is zero; that is, 


Repl rqi) +t + bay T¥y) =0 (8) 
We must show that &,4; = + + + =, =0. Since Tis linear, 8 can be rewritten as 
Tkr4iVr41 tb + + +kyvy,) =0 

which says that ky4¥y41 + * * + ++ &»¥y is in the kernel of 7. This vector can therefore be written as a linear 
combination of the basis vectors {¥1,.... Vy} , Say 

KrgiVrgi ttt bayvy, H=kyvi, ++ + +k,v¥, 
Thus, 

Kyvyp tit bk pvy — kp vypg po + a kyv, =0 

Since {¥1,..., ¥,} is linearly independent, all of the 4's are zero; in particular, ty4, = + + + =k, =0, which 
completes the proof. 
= a 


Concept Review 


e Linear transformation 


Linear operator 


Zero transformation 
e Identity operator 


e Contraction 


Dilation 


e Evaluation transformation 


Kernel 


Range 
e Rank 
° Nullity 


Skills 

e Determine whether a function is a linear transformation. 

e Find a formula for a linear transformation 7: ” —, Jf given the values of T on a basis for V. 
e Find a basis for the kernel of a linear transformation. 

e Find a basis for the range of a linear transformation. 

e Find the rank of a linear transformation. 


e Find the nullity of a linear transformation. 


Exercise Set 8.1 


In Exercises 1-8, determine whether the function is a linear transformation. Justify your answer. 
1. 7: —. 8, where V is an inner product space, and 7'(u) = ||ul|. 

Answer: 

Nonlinear 


2. 7p? _, R3, where Vg is a fixed vector in 3 and T(u) =u vg. 
3. T: M32 — M43, where B is a fixed 2 % 3 matrix and T(.A) = AB. 


Answer: 


Linear 
4.T: My, — R, where T(A) = tr(A). 
5. FE Miy — Mum where F (4} =At 


Answer: 


Linear 


6. T: M9 — R, where 


(a) ap 2 |)a3- ee te-4 
ca 


(b) pffa b]\_ 2 92 
if b))-24 


1. T: P32 — P93, where 
(a) Tao + ayx+ azx”} =ag+ ea, (x -+ 1) + aa(x+ 1)? 


(b) Tag + jx + ax’) = (a0 7 1) + (a + 1x + (a2 + 1}x? 
Answer: 


(a) Linear 
(b) Nonlinear 
8. 7: ( = 00, 00) — #( =o, 00), where 
(a) TOF (x) = 14+ /(%) 
(b) TOF) =f +1) 
9. Consider the basis S= {v1, v2} for R*, where vj = (1, 1) and vz = (1, 0), and let 7. Rp? _, R2 be the linear 
operator for which 
Fwi)y=C1, —2) and T(v2) =(—4, 1) 
Find a formula for 7(x 1, x2), and use that formula to find 7(5, — 3). 


Answer: 


P(x1,%2) = (—4x, + 5x9, x1 —3x9), FO, —3)=(=—35, 14) 
10. Consider the basis § = {v1, v2} for 22, where vj = ( — 2, 1) and vz = (1, 3), and let 7. R2 _, 7 be the 
linear transformation such that 
T(vy) =(=1,2,0) and Tw) = (0, =—3,5) 
Find a formula for 7'(x1, x3), and use that formula to find 7(2, — 3). 
11. Consider the basis S = {¥1, ¥2, ¥3} for 7, where vy = (1, 1, 1), v2 = (1, 1, 0), and v3 = (1, 0, 0), and let 
T: RP? _, PR? be the linear operator for which 
fy) = @-14), T2)=G,9, 1), 
Tova) = (-1,5, 1) 
Find a formula for 7'(x 1, x3, x3), and use that formula to find 7(2,4, = 1). 


Answer: 


P(x1, %3, %3) = (=x, + 4x9 — x3, xy — 5x9 — 23, x1 + 3x3), F(2,4, —1) = (15, =—9, = 1) 
12. Consider the basis S= {v1, v2, v3} for R3, where vy = (1, 2, 1), v2 = (2, 9, 0), and v3 = (3, 3, 4), and let 
T: RP? _, R? be the linear transformation for which 


Tv) =(1,9), Tv2=C-11), T(¥v3)=(, 1) 
Find a formula for 7'(x 1, x3, x3), and use that formula to find 7(7, 13, 7). 
13. Let ¥1, ¥2, and ¥3 be vectors in a vector space V, and let 7: fr —, R be a linear transformation for which 
7m) = (,-1,2), Pv2)=(, 3, 2), 
Tova) = (-3,1,2) 
Find T(2v¥, — 3v2 + 4v3). 
Answer: 
T(2v, — 3v2 + 4v3) = (— 10, —7, 6) 
14. Let 7. p2 _, R? be the linear operator given by the formula 
T(x, vy) = (2x =y, — 8x + 4y) 
Which of the following vectors are in R(£}? 
(a) (1, —4) 
(b) (5, 9) 
(c) (= 3, 12) 


15. Let 7:2 _, R2 be the linear operator in Exercise 14. Which of the following vectors are in ker(#)? 
(a) (3, 10) 
(b) (, 2) 
(c) (1,1) 


Answer: 


(a) 


16. 


17. 


18. 


19. 


20. 


21. 


Let 7. R4_, R3 be the linear transformation given by the formula 
P(x1,%3,%3,%4) = (4x, +x9— 2x3 — 3x4, 
2x, + x9+%3—4x4, 6x1 — 9x3 + 9x4) 
Which of the following are in R(£}? 
(a) (0, 0, 6) 
(b) (1, 3, 0) 
(c) (2,4, 1) 


Let 7: R24 _, PR? be the linear transformation in Exercise 16. Which of the following are in ker(#)? 
(a) (3, —8, 2, 0) 

(b) (9, 9, 9, 1) 

(c) (9, —4, 1, 0) 


Answer: 


(a) 

Let 7: P3 —+ P be the linear transformation defined by T(p{x))} = xp(x). Which of the following are in 
ker(#)? 

(a) x? 

(b) ° 

(c) 1+x 

Let 7: P3 —+ P3 be the linear transformation in Exercise 18. Which of the following are in X(£}? 

(a) x4+x? 

(b) 1+x 

(Oe eer 


Answer: 


(a) 

Find a basis for the kernel of 

(a) the linear operator in Exercise 14. 

(b) the linear transformation in Exercise 16. 


(c) the linear transformation in Exercise 18. 


Find a basis for the range of 
(a) the linear operator in Exercise 14. 
(b) the linear transformation in Exercise 16. 


(c) the linear transformation in Exercise 18. 


Answer: 


(a) (1, -4) 
(b) (4, 2,6), (1, 1,0), (-3, —4, 9) 


(c) x, x7, x7 


22. Verify Formula 7 of the dimension theorem for 
(a) the linear operator in Exercise 14. 
(b) the linear transformation in Exercise 16. 


(c) the linear transformation in Exercise 18. 


In Exercises 23-26, let T be multiplication by the matrix A. Find 
(a) a basis for the range of T. 

(b) a basis for the kernel of T. 

(c) the rank and nullity of 7. 

(d) the rank and nullity of A. 


(b) | —14 

19 

11 
(c) Rank(7) = 2, nullity(7) = 1 
(d) Rank(.A) = 2, nullity(A) = 1 


24. 20 —1 

A=| 40 =2 

20 0 0 

25 A fa (aes ee 

Les 3 

Answer 

(a) | 1 0 
oy J 1 

(b) | =—1 —4 

—1 2 

1 0 

0 7 


(c) Rank (7) = nullty(7) = 2 
(d) Rank (4) = nullity(.A}) = 2 


27. 


28. 


29. 


30. 


31. 


32. 


1 4 5 9 
3 =2 1 - 
A= 
=-1 0 =1 0 =1 
a ae co ae | 
Describe the kernel and range of 


(a) the orthogonal projection on the xz-plane. 
(b) the orthogonal projection on the yz-plane. 


(c) the orthogonal projection on the plane defined by the equation y = x. 
Answer: 


(a) Kernel: y-axis; range: xz-plane 

(b) Kernel: x-axis; range: yz-plane 

(c) Kernel: the line through the origin perpendicular to the plane y = x; range: plane y = x 
Let V be any vector space, and let 7: —, 7 be defined by T{w) = 3v. 


(a) What is the kernel of T? 
(b) What is the range of T? 


In each part, use the given information to find the nullity of the linear transformation 7. 
(a) 7-R? —, R? has rank 3. 

(b) 7:P4— Pz has rank 1. 

(c) The range of 7: R° _, pris R3. 

(d) TF: M39 — Mz has rank 3. 


Answer: 


(a) Nullity(7) = 2 

(b) Nullity(7) =4 

(c) Nullity(7) = 3 

(d) Nullity(7) = 1 

Let A be a7 x 6 matrix such that 4x —Q has only the trivial solution, and let 7. 26 _, 27 be multiplication by 
A. Find the rank and nullity of 7. 

Let A be a 5 x 7 matrix with rank 4. 

(a) What is the dimension of the solution space of 4x — Q? 

(b) Is 4x —h consistent for all vectors b in R°? Explain. 


Answer: 


(a) 3 
(b) No 


Let 7: R? _, WW be a linear transformation from 2? to any vector space. Give a geometric description of ker(¢). 


33. 


34. 


35. 


36. 


37. 


38. 


39. 


40. 


41. 


Let 7-77 —, R? be a linear transformation from any vector space to 2. Give a geometric description of R(t). 


Answer: 
A line through the origin, a plane through the origin, the origin only, or all of p7 
Let 7:22 _, R37 be multiplication by 
13 4 
34 7 
—2 2 0 


(a) Show that the kernel of 7 is a line through the origin, and find parametric equations for it. 


(b) Show that the range of T is a plane through the origin, and find an equation for it. 


(a) Show that if @1, #2, b;, and 43 are any scalars, then the formula 
P(x, y) = (ax + diy, aox + doy) 


defines a linear operator on 22. 


(b) Does the formula ¥ (x, »| = (ax? | biy?, ax? | bay” define a linear operator on 22? Explain. 


Answer: 


(b) No 
Let {v 1, ¥2,..., ¥,} bea basis for a vector space V, and let 7: ” —, } be a linear transformation. Show that if 
P(v1) = T(¥2) = + + + = T(vy) =0 
then T is the zero transformation. 
Let {¥1, ¥3, ....V¥,} bea basis for a vector space V, and let 7: }* —, ” be a linear operator. Show that if 
Twi)=v. T(v2)=v2,--. Tn) =Vn 
then T is the identity transformation on /. 


For a positive integer » = 1, let 7: Af,,, — & be the linear transformation defined by 7.4) = tr(A), where A is 
an »2 % 92 Matrix with real entries. Determine the dimension of ker(£). 


Prove: If {¥1, V2, -... Vy} is a basis for V and wi, w3, ..., Wy are vectors in W, not necessarily distinct, then 
there exists a linear transformation 7° }” —, }¥ such that 


Tiwyj=w, Tlv2)=w.... T(vy) =Wy 


(Calculus required) Let ¥ = C'[a, »] be the vector space of functions continuous on [a@, 2], and let 7:7” —. 
be the transformation defined by 


T(f) =5f (x) 4 3f fiat 


Is T a linear operator? 


(Calculus required) Let D: P3 — P3 be the differentiation transformation D (P) = p' (x). What is the kernel of 
D? 


Answer: 


ker(D) consists of all constant polynomials. 


1 
— (Calculus required) Let J: — & be the integration transformation J(p) = / p(x)dx. What is the kernel 


of J? 


43. (Calculus required) Let V be the vector space of real-valued functions with continuous derivatives of all orders 
on the interval ( — co, oo), and let 4” = #'{ — 00, 00) be the vector space of real-valued functions defined on 
( =o, 00). 
(a) Find a linear transformation 7’: ” —. }f whose kernel is P3. 


(b) Find a linear transformation 7’: }* —, }f whose kernel is Py. 


Answer: 
@) T(x) =f) 
(6) TS) =f OME) 


44. If A is an 2 s¢ 2 matrix, and if the linear system 4x — h is consistent for every vector b in R™, what can you 
say about the range of 7 4:R" — R™? 


True-False Exercises 
In parts (a)—(1) determine whether the statement is true or false, and justify your answer. 


(a) If Pleyvy + cave) =c,T (v1) + ¢27 (v9) for all vectors Vj and V2 in V and all scalars cj and ¢2, then Tis a 
linear transformation. 


Answer: 


True 


(b) If v is a nonzero vector in V, then there is exactly one linear transformation 7: ” —, }# such that 


F(=v) = =—TCy). 
Answer: 


False 


(c) There is exactly one linear transformation 7: —, }¥ for which 7(u + v) = 7(u— v) for all vectors u and v in 
V. 


Answer: 


True 


(d) If ¥g is a nonzero vector in V, then the formula 7’(v) = vg + v defines a linear operator on V. 
Answer: 


False 


(e) The kernel of a linear transformation is a vector space. 
Answer: 


True 


(f) The range of a linear transformation is a vector space. 
Answer: 


True 


(g) If T: Pg — M3 is a linear transformation, then the nullity of Tis 3. 
Answer: 


False 
(h) The function 7: Af33 — & defined by 7(.A) = det A is a linear transformation. 


Answer: 


False 


(i) The linear transformation 7°: A433 — Af33 defined by 


1 3 
T(A) = A 
(4) E 
has rank 1. 
Answer: 
False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


8.2 lsomorphism 


In this section we will establish a fundamental connection between real finite-dimensional vector spaces and the Euclidean 
space 2”. This connection is not only important theoretically, but it has practical applications in that it allows us to perform 
vector computations in general vector spaces by working with the vectors in R”. 


One-to-One and Onto 


Although many of the theorems in this text have been concerned exclusively with the vector space 2”, this is not as limiting 
as it might seem. As we will show, the vector space 8” is the “mother” of all real n-dimensional vector spaces in the sense 
that any such space might differ from 2” in the notation used to represent vectors, but not in its algebraic structure. To 
explain what we mean by this, we will need two definitions, the first of which is a generalization of Definition | in Section 
4.10. (See Figure 8.2.1). 


DEFINITION 1 


If 7: ¥ —, }¥ is a linear transformation from a vector space V to a vector space W, then T is said to be one-to-one if 
T maps distinct vectors in V into distinct vectors in W. 


DEFINITION 2 


If 7: / —, fF is a linear transformation from a vector space V to a vector space W, then T is said to be onto (or onto 
W) if every vector in W is the image of at least one vector in V. 


V Ww Vv Ww i, Ww VS" ~~. Ww 
~ —~~_8 
ee 
Snail ees Mm ee 
—_——___>. e NN -—-——=\ 
° > Range Range 
—_ a of T ne of T 
One-to-one. Distinct | Not one-to-one. There Onto W. Every vector in Not onto W. Not every 
| vectors in V have | exist distinct vectors in W is the image of some vector in W is the image 
| distinct images in W. | V with the same image. | vector in V. of some vector in V. 
Figure 8.2.1 


The following theorem provides a useful way of telling whether a linear transformation is one-to-one by examining its 
kernel. 


THEOREM 8.2.1 


If 7: ¥ —. }¥ is a linear transformation, then the following statements are equivalent. 


(a) T is one-to-one. 


(b) ker(¢) = {0} . 


Proof (a) > (b) Since T is linear, we know that 7(0) =0 by Theorem 8.1.1a. Since T is one-to-one, there can be no 
other vectors in V that map into 0, so ker(t) = {0}. 


(b) > (a) Assume that ker(¢) = {0}. Ifu and v are distinct vectors in V, then y — y + Q. This implies that T7(u — v) #0, 
for otherwise ker(£} would contain a nonzero vector. Since T is linear, it follows that 


F(a) —T(v) = 7Tu—v) +0 


so T maps distinct vectors in V into distinct vectors in W and hence is one-to-one. 


In the special case where V is finite-dimensional and T is a linear operator on V, then we can add a third statement to those 
in Theorem 8.2.1. 


THEOREM 8.2.2 


If Vis a finite-dimensional vector space, and if 7: / —, }” is a linear operator, then the following statements are 
equivalent. 


(a) T is one-to-one. 
(b) ker(#) = {0}. 
(c) Tis onto [ie., R(f) =F] 


Proof We already know that (a) and (6) are equivalent by Theorem 8.2.1, so it suffices to show that (b) and (c) are 
equivalent. We leave it for you to do this by assuming that dim(”) = » and applying Theorem 8.1.4. 


EXAMPLE 1 Dilations and Contractions Are One-to-One and Onto 


Show that if V is a finite-dimensional vector space and c is any nonzero scalar, then the linear operator 
T: — ¥ defined by 7(w) = cv is one-to-one and onto. 


Solution The operator T is onto (and hence one-to-one) for if v is any vector in V then that vector is the 
image of the vector (1 / ¢}v. 


EXAMPLE 2 Matrix Operators <@ 


IfT 4: R” _, R” is the matrix operator 7 4(x) = Ax, then it follows from parts (r) and (s) of Theorem 5.1.6 that 
T 4is one-to-one and onto if and only if A is invertible. 


EXAMPLE 3 Shifting Operators 


Let / = 8™ be the sequence space discussed in Example 3 of Section 4.1, and consider the linear “shifting 
operators” on V defined by 


71 (41, 42, ..., Uy, ---) = (0, 11, 42, ..., Uy, ---) 
72(u1, 42, -.., Uy, --.) = (ED 1, 254, ly, -..) 


(a) Show that 7; is one-to-one but not onto. 


(b) Show that 73 is onto but not one to one. 


Solution 


(a) The operator 7 is one-to-one because distinct sequences in R™ obviously have distinct images. This 
operator is not onto because no vector in X™ maps into the sequence (1, 0, 0, ..., 0, ...), for example. 


(b) The operator 7°; is not one-to-one because, for example, the vectors (1, 0, 0, ..., 0, ...) and 
(2, 0, 0, ..., 0, ...) both map into (0, 0, 0, ..., 0, ...). This operator is onto because every possible 
sequence of real numbers can be obtained with an appropriate choice of the numbers 3, ¥3, ..., By, --- 


Why does Example 3 not violate Theorem 8.2.2? 


EXAMPLE 4 Basic Transformations That Are One-to-One and Onto 


The linear transformations 7; P3 — R4 and T7:My—- R4 defined by 
Ty (a + bx 4 ox? f dx?) = (2, b, c,d) 


n(? 2))-b>+44 


are both one-to-one and onto (verify by showing that their kernels contain only the zero vector). 


EXAMPLE 5 AOne-to-One Linear Transformation 


Let 7: Py, — Py +1 be the linear transformation 
P(p) = T(p(x)) = 3p (2) 
discussed in Example 5 of Section 8.1. If 
p= p(x) =co#eyxe ++ eyx”™ and q=g(x)=dgedix+ ++ + +dyx” 
are distinct polynomials, then they differ in at least one coefficient. Thus, 


T(p|=cox | oyx? Fee c eyxttl and T(q)=dox \ dx? roost d,xtl 


also differ in at least one coefficient. It follows that T is one-to-one since it maps distinct polynomials p and q 
into distinct polynomials 7'(p) and 7'(q). 


CALCULUS REQUIRED 
EXAMPLE 6 A Transformation That ls Not One-to-One << 


Let 
pe ( = 00, 00) FF ( = 00, 00] 


be the differentiation transformation discussed in Example 11 of Section 8.1. This linear transformation is not 
one-to-one because it maps functions that differ by a constant into the same function. For example, 


D(x?) =D(x? 1}=2x 


Dimension and Linear Transformations 


In the exercises we will ask you to prove the following two important facts about a linear transformation 7° }” —, }# in the 
case where V and W are finite-dimensional: 


1. If dim(#’) < dim(), then 7 cannot be one-to-one. 
2. Ifdim(”) < dim(), then 7 cannot be onto. 


Stated informally, if a linear transformation maps a “bigger” space to a “smaller” space, then some points in the “bigger” 
space must have the same image; and if a linear transformation maps a “smaller” space to a “bigger” space, then there must 
be points in the “bigger” space that are not images of any points in the “smaller” space. 


Remark These observations tell us, for example, that any linear transformation from 27 to R2 must map some distinct 
points of P? into the same point in 22, and it also tells us that there is no linear transformation that maps 2? onto all of 2. 


Isomorphism 


Our next definition paves the way for the main result in this section. 


DEFINITION 3 


If a linear transformation 7: } —, }¥ is both one-to-one and onto, then T is said to be an isomorphism, and the 
vector spaces V and W are said to be isomorphic. 


The word isomorphic is derived from the Greek words iso, meaning “identical,” and morphe, meaning “form.” This 
terminology is appropriate because, as we will now explain, isomorphic vector spaces have the same “algebraic form,” even 
though they may consist of different kinds of objects. To illustrate this idea, examine Table | in which we have shown how 
the isomorphism 


ag + ax 4 agx? 4, (20, a1, 42] 
matches up vector operations in 3 and R?. 
Table 1 
Operation in Pz Operation in R3 
3(1 — 2x + 3x7) = 3 — 6x + 9x? 31, — 2, 3) = (3, — 6,9) 


evx—x)4 (iores)—344e | @L-DFG, -L5=G.04) 


Operation in Pz Operation in R3 
(4-4 2x-+3x2)— (2-4-4 3x?) =246x| 4.2.3)-@ -4.3)= 2.6.0) 


The following theorem, which is one of the most important results in linear algebra, reveals the fundamental importance of 
the vector space 2”. 


THEOREM 8.2.3 


Every real n-dimensional vector space is isomorphic to R”. 


Theorem 8.2.3 tells us that a real n-dimensional vector 
space may differ from 2” in notation, but its algebraic 
structure will be the same. 


Proof Let Vbea real n-dimensional vector space. To prove that V is isomorphic to 8” we must find a linear 
transformation 7: ’ —+ 8” that is one-to-one and onto. For this purpose, let 


¥1, V2, --.. Vy 


be any basis for V, let 
u=Ayvy + kava + + beyV¥y (1) 


be the representation of a vector u in V as a linear combination of the basis vectors, and define the transformation 


TV — R" by 
T(u) = (k1, ko, -.. ky) (2) 


We will show that T is an isomorphism (linear, one-to-one, and onto). To prove the linearity, let u and v be vectors in V, let 
c be a scalar, and let 


u=Ayvy FAgvg+ +++ +kyvy, and v=dyvy+davg+ +--+ +ayvy (3) 


be the representations of u and v as linear combinations of the basis vectors. Then it follows from | that 
Tteu) = Plekyvy +ckogvg+ + + + +ckyvy) 

(ek, ck3, ..., cKy) 

c(ky, &9,....%&,) =cT) 


and it follows from 2 that 
Tutyv) = Thy +ay)vy + (ep 4+d2)wa+ +--+ + (Ky +eay)vy) 
(ky +21, ko+ 432, ..,k,+ dy) 
= (kj, &2,....%n) + (21, d2, -...dy) 
= T(u)+T(v) 
which shows that T is linear. To show that 7 is one-to-one, we must show that if u and v are distinct vectors in V, then so are 
their images in 2”. But if y + y, and if the representations of these vectors in terms of the basis vectors are as in 3, then we 


must have k; # d; for at least one i. Thus, 
Pu) = (1, £2, --. kn) # (41, 22, --. dn) = TW) 
which shows that u and v have distinct images under 7. Finally, the transformation T is onto, for if 
w= (ky, £9, -.., Ky) 
is any vector in 2”, then it follows from 2 that w is the image under T of the vector 


u=Ayvy + kovg+ +++ +kyvy 
Remark Note that the isomorphism 7 in Formula 2 of the foregoing proof is the coordinate map 


ne (k1, ko, -... ky) = (u) 5 


that maps u into its coordinate vector with respect to the basis S= {v1, V2, ..., ¥,} . Since there are generally many 
possible bases for a given vector space V, there are generally many possible isomorphisms between V and 2”, one for each 


different basis. 


EXAMPLE 7 The Natural Isomorphism from Pp - 1 to R" << 


We leave it for you to verify that the mapping 
agpeayxee ts + Qp—1x" | Z (a0. Aly -c% an] 
from P},_; to R” is one-to-one, onto, and linear. This is called the natural isomorphism from P,_; to R” 
n—l 


: ‘ ; ; 2 : 
because, as the following computations show, it maps the natural basis { 1,%,X°,..,% \ for P,,_1 into the 


standard basis for 2”: 


L=140x+0x24-- + 40x"! £ (1,0,0,..,0) 
x=O+x+0x74 °° + 40x77) fF, (0,1,0,...0) 
x7) 040x407 +--+ 4x77 £ (0,0,0,....1) 


EXAMPLE 8 The Natural lsomorphism from M22 to R* << 


10 01 0 0 0 0 
ilo of *Ll0 of [i of Bo 3] 


form a basis for the vector space M99 of 2 x 2 matrices. An isomorphism 7: \f55 — R* can be constructed by 
first writing a matrix A in J 9 in terms of the basis vectors as 


a1 a 1 0 01 00 0 0 
aq[es cel=efo ol tee oltea[s ol ted a] 


and then defining 7 as 


The matrices 


T(A) = (@1, 22, @3, a4) 


1 =3|7 
+,/1, —3,4,6 
[sv] 7 (2-349) 


More generally, this idea can be used to show that the vector space Mj, Of #2 x 3 matrices with real entries is 
isomorphic to R”™*, 


Thus, for example, 


EXAMPLE 9 Differentiation by Matrix Multiplication << 


Consider the differentiation transformation D: P3 — 3 on the vector space of polynomials of degree three or 
less. If we map P3 and 3 into R4 and p3, respectively, by the natural isomorphisms, then the transformation D 


produces a corresponding matrix transformation from 24 to 27. Specifically, the derivative transformation 


3D 


ag ayx 4 agx? + 3x a1 + 2a9x 4 3a3x7 


produces the matrix transformation 


aq e 
010 0}, 
00 2 Ol/,,|=| 2a2 
000 3\/45| [3a 


Thus, for example, the derivative 


rat 4x 4x? - x") = 1+ 8x —3x? 
ax 


can be calculated as the matrix product 


010 0 : 1 
002 0 4|> 8 
00 0 3 4 —3 


This idea is useful for constructing numerical algorithms to perform derivative calculations. 


Inner Product Space Isomorphisms 


In the case where V is a real n-dimensional inner product space, both V and 8” have, in addition to their algebraic structure, 
a geometric structure arising from their respective inner products. Thus, it is reasonable to inquire if there exists an 
isomorphism from V to 8” that preserves the geometric structure as well as the algebraic structure. For example, we would 
want orthogonal vectors in V to have orthogonal counterparts in R”, and we would want orthonormal sets in V to 
correspond to orthonormal sets in 2”. 


In order for an isomorphism to preserve geometric structure, it obviously has to preserve inner products, since notions of 
length, angle, and orthogonality are all based on the inner product. Thus, if V and W are inner product spaces, then we call 
an isomorphism 7° ” —, }f an inner product space isomorphism if 


(Tu), T(v)} = (u, v} 


It can be proved that if Vis any real n-dimensional inner product space and 8” has the Euclidean inner product (the dot 
product), then there exists an inner product space isomorphism from V to R”. Under such an isomorphism, the inner 
product space V has the same algebraic and geometric structure as ®”. In this sense, every n-dimensional inner product 
space is a “carbon copy” of ®” with the Euclidean inner product that differs only in the notation used to represent vectors. 


EXAMPLE 10 An Inner Product Space lsomorphism <4 


Let 8” be the vector space of real n-tuples in comma-delimited form, let Mf,, be the vector space of real » x 1 
matrices, let 8” have the Euclidean inner product {u, v} =u v, and let M/), have the inner product 


r joe : : : 
(u, v} =’ Vin which u and v are expressed in column form. The mapping 7: R" —, M,, defined by 
Lal 
TF |¥2 
(V1, V2, --- Vy) > : 

Vn 
is an inner product space isomorphism, so the distinction between the inner product space ®” and the inner 
product space A, is essentially notational, a fact that we have used many times in this text. 


Concept Review 

e One-to-one 

e Onto 

¢ Isomorphism 

e Isomorphic vector spaces 

e Natural isomorphism 

e Inner product space isomorphism 

Skills 

e Determine whether a linear transformation is one-to-one. 
e Determine whether a linear transformation is onto. 


e Determine whether a linear transformation is an isomorphism. 


Exercise Set 8.2 


1. In each part, find ker(£}, and determine whether the linear transformation T is one-to-one. 
(a) T:R? — R?, where T(x, y) = (y, x) 
(b) 7:R?—, R2, where T(x, y) = (0, 2x + 3y) 
(©) T:R? — R?, where T(x, y) = (x+y, x-y) 
(d) 7: R? —, R3, where T(x, y) = (x, y,x+y) 
(e) T:R? — R, where T(x, y) = (x —y, y — x, 2x — 2y) 
(f) 7:R3—, R*, where T(x, y,z) = (x+y +z,x—y —2z) 


Answer: 


(a) ker(7) = {0}; Tis one-to-one 


(b) ker (7) = H- 2, i}}; T is not one-to-one 


(c) ker(7) = {0}; Tis one-to-one 


(d) ker(7) = {0}; Tis one-to-one 
(e) ker(7) = {k(1, 1)}; Tis not one-to-one 
(f) ker(7) = {£(0, 1, — 1)}; Tis not one-to-one 


2. Which of the transformations in Exercise | are onto? 
3. In each part, determine whether multiplication by A is a one-to-one linear transformation. 
(a) 1 =-2 
A=| 2 -4 
—3 «6 
(b) 1 357 
A=| 2 -1 2 4 
-1 300 
(c) 4 -2 
A=|1 5 
5 3 
Answer: 


(a) Not one-to-one 
(b) Not one-to-one 


(c) One-to-one 


> 


. Which of the transformations in Exercise 3 are onto? 


mn 


. As indicated in the accompanying figure, let 7. R2 _, 2 be the orthogonal projection on the line y = x. 
(a) Find the kernel of 7. 


(b) Is T one-to-one? Justify your conclusion. 


Figure Ex-5 


Answer: 


(a) ker(7) = {KC=—1, 1} 
(b) Tis not one-to-one since ker(7) # {0}. 

6. As indicated in the accompanying figure, let 7. 22 _, R2 be the linear operator that reflects each point about the y-axis. 
(a) Find the kernel of 7. 


(b) Is T one-to-one? Justify your conclusion. 


Figure Ex-6 


7. In each part, use the given information to determine whether the linear transformation T is one-to-one. 
(a) T:R™—R™, nullity(s) =0 
(b) 7:2" — R”, rank(t)=2=-1 
(c) T:R™ = R”. nm 
(d) T:R” = R”, Rt) =R” 


Answer: 


(a) T is one-to-one 
(b) T is not one-to-one 
(c) T is not one-to-one 
(d) T is one-to-one 

8. In each part, determine whether the linear transformation T is one-to-one. 
(a) T:P3— P3, where Tay + &1x + ax’) =X (a0 ++ ax + ax”) 
(b) T:P2—+ Pa, where T(p(x)) = p(x +1) 


9. Prove: If V and W are finite-dimensional vector spaces such that dim(}#”) < dim(}’), then there is no one-to-one linear 
transformation 7:  — }¥”’. 


10. Prove: There can be an onto linear transformation from V to W only if dim(?”) > dim(¥’). 
i. (a) Find an isomorphism between the vector space of all 3 y¢ 3 symmetric matrices and 7°. 
(b) Find two different isomorphisms between the vector space of all 2 s< 2 matrices and p4. 
(c) Find an isomorphism between the vector space of all polynomials of degree at most 3 such that »(0) = 0 and p3. 


(d) Find an isomorphism between the vector spaces span {1, sin(x), cos(x)} and R3. 


Answer: 
(a) a 
abe : 
T b a 2 = d 
ce ff eg 
ed 
(b) 


a | 
——_—— 
| AL | 
a RR 
Ro 
a | 
—— 

I 

Ra & R 
hj 
ao 
[| 
Sa R 
Ro 
a | 
—— 
| 
R & AR 


) mad + bn? A 
Tax” + hx* + cx) =] 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


(d) . 
T(a+ sm(x) +e cos(x)) =| b 


c 


1 
(Calculus required) Let J: P, —+ 8 be the integration transformation / P| = 7 p(xjdx. Determine whether J is 


one-to-one. Justify your conclusion. 
(Calculus required) Let V be the vector space o [°. 1] and let 7:  —, ® be defined by 
PE\=F (0) | 2f'(0) | oF (1) 
Verify that Tis a linear transformation. Determine whether T is one-to-one, and justify your conclusion. 


Answer: 


T is not one-to-one since, for example, 7 (x) = x(x - ii" is in its kernel. 


(Calculus required) Devise a method for using matrix multiplication to differentiate functions in the vector space 
span {1, sin(x), cos(x), sin(2x), cos(2x)}. Use your method to find the derivative of 
3 =4 sin(x) + sin(2x) + 5 cos(2x). 


Does the formula T(a, b,c =ax? +bx+e define a one-to-one linear transformation from 2? to P3? Explain your 


reasoning. 
Answer: 


Yes; it is one-to-one 


Let E be a fixed 2 x 2 elementary matrix. Does the formula 7A) = #-A define a one-to-one linear operator on Jf 33? 
Explain your reasoning. 


Let a be a fixed vector in 27. Does the formula 7(v) = ax v define a one-to-one linear operator on 23? Explain your 


reasoning. 
Answer: 


T is not one-to-one since, for example a is in its kernel. 


Prove that an inner product space isomorphism preserves angles and distances—that is, the angle between u and v in V 
is equal to the angle between J(u) and 7(w) in W, and ||u — ¥|| p= ||7(u) — Tv) || yp. 


Does an inner product space isomorphism map orthonormal sets to orthonormal sets? Justify your answer. 
Answer: 


Yes 


Find an inner product space isomorphism between Ps and M33. 


True-False Exercises 


In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 


(a) The vector spaces 22 and P3 are isomorphic. 


Answer: 


False 


(b) If the kernel of a linear transformation 7: P3 — 3 is {0}, then Tis an isomorphism. 


Answer: 


True 


(c) Every linear transformation from Jf 33 to Pg is an isomorphism. 
Answer: 


False 


(d) There is a subspace of 43 that is isomorphic to Rt. 
Answer: 


True 


(e) There is a 2 5 2 matrix P such that 7: 4433 — M9 defined by T(.A) = AP — PA is an isomorphism. 
Answer: 


False 


(f) There is a linear transformation 7: ?4 —+ 4 such that the kernel of T is isomorphic to the range of 7. 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


8.3 Compositions and Inverse Transformations 


In Section 4.10 we discussed compositions and inverses of matrix transformations. In this section we will 
extend some of those ideas to general linear transformations. 


Composition of Linear Transformations 
The following definition extends Formula 1 of Section 4.10 to general linear transformations. 


Note that the word “with” establishes the order 
of the operations in a composition. The 
composition of 73 with T is 

(720 Ty) (u) = 72(71 (u)) 
whereas the composition of 7; with T is 

(71 0 T2)(u) = 71 (72(u)) 


DEFINITION 1 


If 7:0’ + ¥ and 73:¥ — W are linear transformations, then the composition of T with Ty, 
denoted by 73 o T; (which is read “7", circle 7”), is the function defined by the formula 


(720 71) (a) = 72(71(u)) (1) 


where u is a vector in U. 


Remark Observe that this definition requires that the domain of 73 (which is V) contain the range of 7. 
This is essential for the formula 7'3(7'; (w)) to make sense (Figure 8.3.1). 


T, T, 
u Tu) TAT (u)) 
ue Vv Ww 


Figure 8.3.1 The composition of 73 with 7. 


Our first theorem shows that the composition of two linear transformations is itself a linear transformation. 


THEOREM 8.3.1 


If 7):U) = ¥ and 73: — W are linear transformations, then (73 0 71): U' — W is also a linear 


transformation. 


Proof Ifuand vare vectors in U and c is a scalar, then it follows from | and the linearity of 7 and 7’; that 


(T207T))\(ut+v) = 72(7,;(u+v)) = 72(71 (a) + 71 (¥)) 
T2(7T1(u)) + T2(71(¥)) 
(T2071) (a) + (720 71) (¥) 


and 
(Tz0T)(cu) = 13(Ti(cu)) = 72(eT;(u)) 
ef2(Tj(u)) =e(T20 71) () 


Thus, 7° o 7; satisfies the two requirements of a linear transformation. 


EXAMPLE 1 Composition of Linear Transformations 


Let 7;:P | — Pz and 73:2 — P32 be the linear transformations given by the formulas 
Pi(pQ@)) =xp(x) and 13(p(x)) = plex +4) 
Then the composition (73 o 7}: —» P3 is given by the formula 
(720 Ty) ()) = T2071 P@))) = Tap) = (ex + 4) pax + 4) 
In particular, if p(x} = eg + ¢ 1x, then 
(7207T1)@@)) = (F20T1) (0 +e1%) = (ex +4) (cg +01 (2x + 4)) 


= co (2x +} 4) +0¢1,(2x + 4)? 


EXAMPLE 2 Composition with the Identity Operator << 


If 7: —, ¥ is any linear operator, and if 7.” —, ¥ is the identity operator (Example 3 of Section 
8.1), then for all vectors v in V, we have 

(PoZ)(v) TU(v)) = TW) 

(fo T)(¥) i(T(¥)) = T(v) 


It follows that 7 , j and j o F are the same as 7; that is, 


Tol=T and loT=T (2) 


As illustrated in Figure 8.3.2, compositions can be defined for more than two linear transformations. For 
example, if 


Ty: U7, Tz ¥ —~W, and T3:¥WF 


are linear transformations, then the composition 73 o 73 0 T is defined by 


(730 T2071) (uw) = 73072771 ())) (3) 


(T;¢7,°T, Mu) 


u Tu) TAT (uy) T(TAT\(u))) 
U V Ww Y 


Figure 8.3.2 The composition of three linear transformations. 


Inverse Linear Transformations 


In Theorem 4.10.1 we showed that a matrix operator 7 4: R” _, R” is one-to-one if and only if the matrix A is 
invertible, in which case the inverse operator is 7 4-1. We then showed that if w is the image of a vector x 
under the operator 7’ 4, then x is the image under 7” 47! of the vector w (see Figure 4.10.8). Our next objective 
is to extend the notion of invertibility to general linear transformations. 


Recall that if 7: }” —, }¥ is a linear transformation, then the range of 7, denoted by X{£)}, is the subspace of W 
consisting of all images under T of vectors in V. If Tis one-to-one, then each vector w in X(£)} is the image of 
a unique vector v in V. This uniqueness allows us to define a new function, called the inverse of T and 
denoted by 7~!, that maps w back into v (Figure 8.3.3). 


T 


V T"! RT) 


Figure 8.3.3. The inverse of T maps 7’(w) back into v. 


It can be proved (Exercise 19) that rr -R(£) — F is a linear transformation. Moreover, it follows from the 


definition of 7—! that 


(eb))<r ly 5 


He" ))="(e)= 


so that 7 and 7—!, when applied in succession in either order, cancel the effect of each other. 


Remark It is important to note that if 7: —, }# is a one-to-one linear transformation, then the domain of 
7-1 is the range of T, where the range may or may not be all of W. However, in the special case where 

TT: — F is a one-to-one linear operator and V is n-dimensional, then it follows from Theorem 8.2.2 that T 
must also be onto, so the domain of 7! is all of V. 


EXAMPLE 3 An Inverse Transformation 


In Example 5 of Section 8.2 we showed that the linear transformation 7: P), + Py4+1 given by 
P(p) = Tp) = xp) 

is one-to-one; thus, 7 has an inverse. In this case the range of T is not all of P,,41 but rather the 

subspace of 41 consisting of polynomials with a zero constant term. This is evident from the 

formula for T: 


Teo beyxte 4 en” |= cox hoixt es peyxtt! 
It follows that 7-1. R(t) —» Py, 18 given by the formula 
ro (cox + 1x7 li 2 cnx”) =e9 heyxts s+ beyx” 


For example, in the case where » > 3, 


TO (2x =x? + 5x3 4 3x4\=2—x + 5x2 4 3x3 


EXAMPLE 4 An Inverse Transformation << 


Let 7:22 —, R? be the linear operator defined by the formula 
T(x1, %2,%3) = (3x, +429, — 2x1 —4x2 + 3x3, 5x, + 4x2 — 2x3) 


Determine whether 7 is one-to-one; if so, find i (x. x2, x3}. 


Solution It follows from Formula 12 of Section 4.9 that the standard matrix for T is 


Ss 4 0 
T/=|=—2 —4 3 
5 4 =2 
(verify). This matrix is invertible, and from Formula 7 of Section 4.10 the standard matrix for 
Tl is 
4 =-2 =3 
T}=[7T)1=|-11 6 9 
-12 7 10 


It follows that 


x4 x4 4 =—2 =3][71 4xy — 2x2 — 3x3 
TO|x2|)=|77]/x2]=]-11 6 9]/x2)=|—11xy + 6x2 + 9x3 
x3 x3 =—12 7 10]|%3 —12x, + 7x2 + 10x3 


Expressing this result in horizontal notation yields 


i (x1. x2, x3)= (4x1 — 2x2 — 323, — 11xy + 6x9 + 9x3, — 12x, + 7x24 103 


Composition of One-To-One Linear Transformations 


The following theorem shows that a composition of one-to-one linear transformations is one-to-one, and it 
relates the inverse of a composition to the inverses of its individual linear transformations. 


THEOREM 8.3.2 


If 7:0) + ¥ and 73: — are one-to-one linear transformations, then 
(a) T30 Ty is one-to-one. 
() (T207T}) 1 =T,! oT;!. 


Proof (a) We want to show that 73 o 7 maps distinct vectors in U into distinct vectors in W. But if u and v 
are distinct vectors in U, then 7 (w) and 7°; (wv) are distinct vectors in V since 7 is one-to-one. This and the 
fact that 7°3 is one-to-one imply that 


TCT ;(u)) and 73(71(¥)) 
are also distinct vectors. But these expressions can also be written as 
(T20T,)(u) and (T2071)(¥) 


so 73 o T maps u and v into distinct vectors in W. 


Proof (b) We want to show that 


(Geti~ (w] = (7;" oe }(w] 
for every vector w in the range of 73 o 7. For this purpose, let 
u= (7307}) —t (w] (6) 


so our goal is to show that 


v= (04) 


But it follows from 6 that 
(720 71)() =w 
or, equivalently, 
P3(71(u)) =w 
Now, taking Ty 1 of each side of this equation, then taking T, 1 of each side of the result, and then using 4 


yields (verify) 
oat) 
or, equivalently, 


cae) 


In words, part (b) of Theorem 8.3.2 states that the inverse of a composition is the composition of the inverses 
in the reverse order. This result can be extended to compositions of three or more linear transformations; for 
example, 


(73072071) '=7,! 0T3! oT;! (7) 


In the case where 7’ 4, 7p, and 7’ are matrix operators on R”, Formula 7 can be written as 
(Te oTpoT4) 1 =T3! oTz! oTe! 


or alternatively as 


(Tora) | = T y-algdye-l (8) 


Note the order of the subscripts on the two 
sides of Formula 8. 


Concept Review 
¢ Composition of linear transformations 


e Inverse of a linear transformation 


Skills 

e Find the domain and range of the composition of two linear transformations. 
e Find the composition of two linear transformations. 

e Determine whether a linear transformation has an inverse. 


e Find the inverse of a linear transformation. 


Exercise Set 8.3 


1. Find (73 0 T1) (x, y)- 
(a) Ty (x, y) = (2x, 3y), T2(x, y) = (xy, x+y) 
(b) 7; (x, y) = (x — 3y, 0), Tolx, ») = (4x — Sy, 3x — 6y) 
(C) Ty (x, y) = (2x, —3y, x+y) T2(x, y.z) = (x —y, ¥ +2) 
(d) T(x, ¥) = —y, yx) Tax, y,z) = (0, x+y +2) 


Answer: 


(a) (T2071) (x, y) = (2x — 3y, 2x + 3y) 
(b) (720 71) (x, y) = (4x — lay, 3x — Sy) 
(c) (T2071) (x, y) = (2x + 3y, x — 2y) 
(d) (T2071) (x, ») = (0, 2x) 
- Find (73 0 Tz 0 T1)(x, y)- 
(a) Ty(x, y) = (—2y, 3x, x — 2y), T2(x, y, Zz) = (y, Zz, x) T3(x, y, Zz) = (x +2, y —Zz) 


(b) Ty (x,y) =(x+y, y, —x)> T2(x, y,z) = (0, x + y +z, 3y), 
73(x, y,Z) = (3x + 2y, 4z =x = 3y) 


nN 


Ge 


. Let Ty: M99 — Rand F3: M33 — M3 be the linear transformations given by 7 (A) = tr(.A) and 
tr (4) — Al 
() Find (71 0 T3)(A), where A= b ‘] | 
(b) Can you find (73 o 7) (4)? Explain. 


Answer: 


(a) @+d 
(b) (73.0 7 4)(A) does not exist since Tj (A) is not a 2 x 2 matrix. 


> 


. Let 7): P,, + Py and T3:P, — P,, be the linear operators given by 7 (p(x)) = p(x = 1) and 
Pa(p(x)) = p(x + 1). Find (71 0 72) (p(x)) and (72 0 71) (p(x)). 

. Let 7): ¥ — ¥ be the dilation 7; (vw) = 4v. Find a linear operator 73: — ¥ such that 7; o 73 =/ and 
Ta0Ty=!. 


an 


Answer: 


T2(v) = 3 


6. Suppose that the linear transformations 7}: P3 —» P32 and T3:P3—+ P3 are given by the formulas 


T1(p(x)) = p(x + 1) and 73(p(x)) = xp(x). Find (72 rs) 71) (a0 + ax + azx"}, 
7. Let gg(x) be a fixed polynomial of degree m, and define a function T with domain P,, by the formula 
T(p(x)) = p(g¢o(x)). Show that 7 is a linear transformation. 
8. Use the definition of 73 o 73 o Tj given by Formula 3 to prove that 
(a) 73 0 F307, isa linear transformation. 
(b) T30 T3207, = (73073) oT}. 
(c) T30 T2307; =T30 (T2074). 
9. Let 7:2? —_, PR? be the orthogonal projection of 27 onto the xy-plane. Show that 74 T= T. 


10. In each part, let 7. R2 _, R2 be multiplication by A. Determine whether 7 has an inverse; if so, find 
ik R 


(rn) 


(a)-g ae 
[3 


(b) A= 6 =—3 
4 =2 
(c) A= mee 
-1 3 
11. In each part, let 7: 2 —_, 23 be multiplication by A. Determine whether 7 has an inverse; if so, find 
x4 
| |x5 
x3 
(a) 15 2 
A= | oe | 
-1 10 
(b) 14-1 
A= 12 1 
-1 1 #O 
(c) ee? | 
A=|0 1 1 
1-1 
(d) 1 =1 1 
A=/0 _ —- 
2 ‘3 


Answer: 


(a) T has no inverse. 


12. 


13. 


14. 


(b) bene ae fea 
a gx1 + x2 473 
1 pe ‘l. nt 
T 2 = Qt + Q%2 + 4*3 
oo 2 1 
—gt1+ grat 43 
(c) y eeeey Geer 
= 51 x2 + 5*3 
T )x2]= —Saitgr2+ 533 
ee WW egliet a, ahi 
5X1 F 5X2 = 5x3 
(d) x4 3x1 + 3x2=—%3 
T)x2|= 2x1 — 2x9 +%3 
x3 


—4x1 — 5x3 + 2x3 


In each part, determine whether the linear operator 7: 2" —. 2” is one-to-one; if so, find 
T! ae 7 Xn}: 
(a) 2(%1, %2, ---+ Xn) = CO, *1, %2, --- Xn-1) 
(b) 2(%1, %2, --. Xn) = Xn, Xn-1, --» 42, *1) 
(c) F(x1,%2, ---+%n) = (X2, X43, --2 Xn, X1) 
Let 7:2" —, R” be the linear operator defined by the formula 
P(X, 22, --+ Xn) = (41%X1, 42%2, -... AnXp) 
where @1, ..., @y are constants. 
(a) Under what conditions will T have an inverse? 


(b) Assuming that the conditions determined in part (a) are satisfied, find a formula for 
a (x. Fy Xn}: 


Answer: 


(a) ay #0 fori=1, 2, 3,...% 


b) 7-1 ee et eee eee a 
(6) 7 (x1, x9, 23, --5%n) = reat ajith gyi z*n) 


Let 7;:R? — R? and Ty: R? _, R? be the linear operators given by the formulas 
MiG@y)=@+y,.x—y) and 120x,y) = (ex +y, x — 2y) 


(a) Show that 7 and 7°; are one-to-one. 
(b) Find formulas for 


7," (x.y), 7" (x.y), (T2071) (x, y] 


(c) Verify that (73071)! =Ty! o Ty!. 


15. Let 7: P3— P3 and T3: P3 — P3 be the linear transformations given by the formulas 
Py(p(Q)) =xp(x) and T2(p(x)) =p +1) 
(a) Find formulas for ae (p(x)). i (p(x)), and (T3071) 1 (p(x))- 
(b) Verify that (7307 ,)7! = ii m a 


Answer: 


@ 7 (p(xy) = 2, To) = P= 1); (M20 MT) TP@) =A p@=1) 
16. Let 7 4 R? —. R7, Tp: R? — Ri, and To: R? —, R? be the reflections about the xy-plane, the xz-plane, and 
the yz-plane, respectively. Verify Formula 8 for these linear operators. 
17. Let 7: P; —, R? be the function defined by the formula 
P(p(x)) = (p(9), PC) 
(a) Find 7(1 = 2x). 
(b) Show that Tis a linear transformation. 


(c) Show that Tis one-to-one. 


@) Find T (2. 3}, and sketch its graph. 


Answer: 


(a) (1, = 1) 
(d) T1(2,3)=24x 


18. Let 7. R2 _, R2 be the linear operator given by the formula T(x, y) = (x + ky, —y). Show that Tis 
one-to-one and that 7—! — F for every real value of k. 
19. Prove: If 7:  —, } is a one-to-one linear transformation, then TI. R(t) — F is a one-to-one linear 


transformation. 


In Exercises 20-21, determine whether 7; 0 73 = T30T}. 


20. (a) TI: R? _, p2 is the orthogonal projection on the x-axis, and T2: R? _, p? is the orthogonal projection 
on the y-axis. 
(b) 7: R? _, p? is the rotation about the origin through an angle #1, and T2: R? _, p? is the rotation 
about the origin through an angle #3. 
(c) 71:2? — R? is the rotation about the x-axis through an angle #1, and 75. R? —, R? is the rotation 
about the z-axis through an angle ff. 


21. (a) T1:R? — R? is the reflection about the x-axis, and 7,:R? _, R? is the reflection about the y-axis. 


(b) 7,:R? — R? is the orthogonal projection on the x-axis, and 7: R2 _, R? is the counterclockwise 
rotation through an angle #. 


22. 


23. 


(c) 71:2? — R? isa dilation by a factor k, and 75. R? _, R? is the counterclockwise rotation about the 
z-axis through an angle @. 


Answer: 


(a) TyoTg=T20T; 

(b) T107T2#T20T; 

(c) T107T2=T20T; 

(Calculus required) Let 
x 

olr| = f'(x) and i) =[ f (fat 

O 

be the linear transformations in Examples 11 and 12 of Section 8.1. Find (7 o D)(f) for 

(a) F(x) =x* 43x42 

(b) F(x) =sinx 

(c) F(x) =e" +3 


(Calculus required) The Fundamental Theorem of Calculus implies that integration and differentiation 
reverse the actions of each other. Define a transformation D: P,, —» P,»_1 by P( P(x) } = p'(x), and 


define J: P),_1 — Py by 
x 
{pc |= f p(t)dt 


(a) Show that D and J are linear transformations. 


(b) Explain why J is not the inverse transformation of D. 


(c) Can the domains and/or codomains of D and J/ be restricted so they are inverse linear transformations? 


True-False Exercises 


In parts (a)-(f) determine whether the statement is true or false, and justify your answer. 


(a) The composition of two linear transformations is also a linear transformation. 


Answer: 


True 


(b) If 7: — ¥ and T3:¥ — ¥ are any two linear operators, then 7; o 73 = T2307}. 


Answer: 


False 


(c) The inverse of a linear transformation is a linear transformation. 


Answer: 


False 


(d) If a linear transformation 7 has an inverse, then the kernel of T is the zero subspace. 
Answer: 


True 
(e) If 7-22 —, R? is the orthogonal projection onto the x-axis, then 7! p2 _, R2 maps each point on the 


x-axis onto a line that is perpendicular to the x-axis. 
Answer: 


False 


(f) If 7:0’ — ¥ and 73: — W are linear transformations, and if 7; is not one-to-one, then neither is 
1207}. 


Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


8.4 Matrices for General Linear Transformations 


In this section we will show that a general linear transformation from any n-dimensional vector space V to any 
m-dimensional vector space W can be performed using an appropriate matrix transformation from 2” to R™. This idea is 
used in computer computations since computers are well suited for performing matrix computations. 


Matrices of Linear Transformations 


Suppose that V is an n-dimensional vector space, W is an m-dimensional vector space, and that 7: }” —, }# is a linear 
transformation. Suppose further that B is a basis for V, that 8" is a basis for W, and that for each vector x in V, the 
coordinate matrices for x and T(x) are [x] p and [7(x) ] p’, respectively (Figure 8.4.1). 


r 


A vector x me Tix) A vector 
in V { in W 
| (n-dimensional) (m-dimensional) 
A vector A vector 
nul 7 mm 
inR Ix], [7ix)],. inR 
Figure 8.4.1 


It will be our goal to find an jy 5 2 matrix A such that multiplication by A maps the vector [x] p into the vector [T'(x) ] pr 
for each x in V (Figure 8.4.2a). If we can do so, then, as illustrated in Figure 8.4.2 b, we will be able to execute the linear 
transformation T by using matrix multiplication and the following indirect procedure: 


Finding T (x) Indirectly 


Step 1. Compute the coordinate vector [x] p. 
Step 2. Multiply [x] p on the left by A to produce [7'(x)] pr. 


Step 3. Reconstruct T(x) from its coordinate vector [7(x)] pr. 


T maps 
V into W 
x f > T(x) x a m Tix) 
computation 
| | w 0 
r j Multiply by A 
Ix], A T(x]: (x], ———— [1®)], 


(2) 


Multiplication 
by A 
maps R” into R’ 


re 


(a) (b) 
Figure 8.4.2 


The key to executing this plan is to find an 2 x » matrix A with the property that 
A[x] p= [7(x)] p (1) 


For this purpose, let B= {uy, uz, ..., Uy} be a basis for the n-dimensional space V and B’ = {¥1, V2, --- ¥m } a basis for 
the m-dimensional space W. Since Equation 1 must hold for all vectors in V, it must hold, in particular, for the basis 
vectors in B; that is, 


Afuj]g=[Tu)]g Aluz]g=([Tu2)]g.--. Alun) g= [Tun] 2 (2) 
But 
1 0 
0 1 0 
[uJeg=|o}, [uzle=/o}.--- [unle=|o 
0 0 1 
so 
1 
@j{, @12 --. Bly 0 a1 
a3, @32 ... a2 a1 
Afalg =|. Oe aad fe 
aml em? 4mn : aml 
a1, 12 4in : 412 
aj, a7 a3 a3 
Alug]g =| 72 972 93m | 9 | | % 
&m1 @m2 --- mn : m2 
@{{ @12 --. Bly ; 21n 
a3, a2 ... a2 a2 
Alu,]g =] :  Hop=l 
@m1 m2 --- 4mn ; Smn 
Substituting these results into 2 yields 
a1 12 41n 
421 22 a2 
1) = [Teur1e [92 |= [Tour] gr» | 92" |= (7a) 
aml m2 Amn 


which shows that the successive columns of A are the coordinate vectors of 
Puy), Tug), -.. Fn) 
with respect to the basis 8’. Thus, the matrix A that completes the link in Figure 8.4.2a is 


A= [[7 (ay) ] a [Pua] a" [7d 1 2] (3) 


We will call this the matrix for T relative to the bases B and B and will denote it by the symbol [7] B'p. Using this 
notation, Formula 3 can be written as 


[7] e\e= (Fu) ) a|[7 ua) ] a*}-|[7 nd] 2°] (4) 
and from 1, this matrix has the property 
IT1p' elxle = 17x) 1z° (5) 


We leave it as an exercise to show that in the special case where 74: R” _, R™ is multiplication by A, and where B and 3! 
are the standard bases for R” and R™, respectively, then 


ITiptp=A (6) 


Remark Observe that in the notation [7] g' g the right subscript is a basis for the domain of T, and the left subscript is 


a basis for the image space of T (Figure 8.4.3). Moreover, observe how the subscript B seems to “cancel out” in Formula 
5 (Figure 8.4.4). 


ITly. » 


Basis for the Basis for the 
image space domain 


Figure 8.4.3 


[Ty glX]y = (TO), 


Cancellation 


Figure 8.4.4 


EXAMPLE 1 Matrix fora Linear Transformation 


Let 7: P; — P3 be the linear transformation defined by 
T(p(x)) =xp(a) 
Find the matrix for T with respect to the standard bases 
B= {u1, uz} and 3B! = {v1 v2, v3} 
where 


uj=1, u=x, vw=1, vwo=2%, va=x? 


Solution From the given formula for T we obtain 
T(uy) = 71) = (@) 01) =x 


T(ug) = T(x) = (x) (x) =x? 


By inspection, the coordinate vectors for 7 (uj) and 7(ug) relative to B’ are 


0 0 


[Tp ]g=|1}. [72] g =] 0 
0 1 
Thus, the matrix for T with respect to B and 3" is 
0 0 
[T] p a= [[7up]e[7ua)]e] =} 1 0 
0 1 


EXAMPLE 2 The Three-Step Procedure 


Let 7: P| — P be the linear transformation in Example 1, and use the three-step procedure described in 
the following figure to perform the computation 


T{a 4- bx | =x(a + br} =ax + bx? 


Direct : 
x F T(x) 
computation 


(l | [> 
Multiply by [T],. » 


ly) (TO 


Solution 


Step 1. The coordinate matrix for x = g + bx relative to the basis B= {1, x} is 


Step 2. Multiplying [x] p by the matrix [7] g’ g found in Example | we obtain 


1 of[5]-|> 
[Fle elx]ge=|1 0 =|a)= [7(x)] 2 
o aie te 


Step 3. Reconstructing 7(x) = T(@ + 4x) from [7(x)] p* we obtain 
T(a+bx}=0 +ax + bx? = ax + bx? 


Although Example 2 is simple, the procedure that it 
illustrates is applicable to problems of great 
complexity. 


EXAMPLE 3 Matrix fora Linear Transformation << 


Let 7. R? _, R3 be the linear transformation defined by 
*2 0 61 


x x 
r(/3;])- —5x, + 13x2} =] —5 13 7 
—7?x1 + 16x =—7 16 


Find the matrix for the transformation T with respect to the bases B= {uy, uz} for R2 and 
B= {v1, V2, v3} for 23, where 


: 5] 1 -1 0 
«=|7] w=|)} ¥vj= 0], w= 2|, w3=|1 
—1 2 2 
Solution From the formula for 7, 
1 2 
F(uj)=] —2|, Tu) = 1 
—5 —3 


Expressing these vectors as linear combinations of ¥1, ¥2, and V3, we obtain (verify) 
Tui) =vj —2v3, 9 T(u2) = 3v1 + v2 — v3 


Thus, 
1 3 
[Tap ]e'=| O}, [Fau)]gr=} 1 
=2 =-1 
so 
1 3 
[Tle g=((Tple|[Taale]=| 0 1 
—2 =1 


Remark Example 3 illustrates that a fixed linear transformation generally has multiple representations, each depending 
on the bases chosen. In this case the matrices 


0 1 1 3 
Tl=|]-5 13 and [Tle p= 0 1 
-7 16 = = 


both represent the transformation 7, the first relative to the standard bases for 22 and 22, the second relative to the bases 


B and #’ stated in the example. 


Matrices of Linear Operators 


In the special case where j* — }f (so that 7: 7” —, is a linear operator), it is usual to take 8 = 8’ when constructing a 
matrix for T. In this case the resulting matrix is called the matrix for T relative to the basis B and is usually denoted by 
[7] rather than [7] pp. If B= {uy, ug, ..., uy} , then Formulas 4 and 5 become 


Phrased informally, Formulas 7 and 8 state that the 
matrix for T, when multiplied by the coordinate 
vector for x, produces the coordinate vector for T(x) 


(T]e= ((%1) ] a|[7 ua) ) a} [Pn] 2] (7) 


[T] alx]lp=([7@)]p (8) 


In the special case where 7:2” — 2” is a matrix operator, say multiplication by A, and B is the standard basis for 2”, 
then Formula 7 simplifies to 


[T]p=A (9) 


Matrices of Identity Operators 


Recall that the identity operator 7.” —. /” maps every vector in V into itself, that is, /(x) =x for every vector x in ?”. The 
following example shows that if V is n-dimensional, then the matrix for J relative to any basis B for V is the yy 5 9, identity 
matrix. 


EXAMPLE 4 Matrices of Identity Operators << 


If 8= {uy, uz, ..., U,} is a basis for a finite-dimensional vector space /’, and if /:/” —, /” is the identity 
operator on }’, then 


f(a) =u, (ug) =uz,.... J(uy) =uy 
Therefore, 
TO: <sc° 0 
01... 0 
[Yle=|0 0... of=/ 
00... 1 


1 1 1 
apie (ale [Gn)]e 


EXAMPLE 5 Linear OperatoronP2 


Let 7: P3 —+ P2 be the linear operator defined by 
T(p(x)) = p(3x = 5) 
that is, Teo ++ c1x + c2x"| =ceo+c, (3x = 5} + ¢3(3x = 5)". 


(®) Find [7] p relative to the basis 8 = {1, x, xt 
() Use the indirect procedure to compute r(t + 2x + 3x7), 


(©) Check the result in (b) by computing r(l + 2x 4 3x7} directly. 


Solution 


(a) From the formula for 7, 
_ = 2\ _ 2 = 62 
r(1\=1, T(x) = 3x =—5, T(x )=Gx-5) = 9x° = 30x + 25 


so 


1 —5 25 
[TM]e=l0), [T@)]e=| 3], al ~30 
0 0 9 
Thus, 
1-5 25 
[T]p=|0 3 —30 
0.6 60 9 
(>) Step 1. The coordinate matrix for p= 1 + 2x + 3x? relative to the basis 3 = { 1,x,x | is 
1 
[P]lp=|2 
3 
Step 2. Multiplying [p] p by the matrix [7] p found in part (a) we obtain 
1-5 25)/1 66 
[T]glple=|0 3 —30//2|=| -84)=[7@)]z 
0 0 9 || 3 27 


Step 5. Reconstructing Tp] - r(I + 2x + 3x *) from [7(p) ] » we obtain 
T(t 4 2x + 3x7} = 66 — 84x + 27x? 
(c) By direct computation, 
r(1 oe De 3x7) =1+4 2 (3x — 5} + 3(3x —5)? 


= 1+ 6x — 10 + 27x? = 90x +75 
= 66 = 84x + 27x? 
which agrees with the result in (b). 


Matrices of Compositions and Inverse Transformations 


We will conclude this section by mentioning two theorems without proof that are generalizations of Formulas 4 and 7 of 
Section 4.10. 


THEOREM 8.4.1 


If 71:0) + ¥ and 73: ¥ — are linear transformations, and if B, 8", and B" are bases for U, , and W, 
respectively, then 


[T2071] a p=([Talere"(Tilep (10) 


THEOREM 8.4.2 


If 7: ¥ —. ¥ is a linear operator, and if B is a basis for V, then the following are equivalent. 
(a) T is one-to-one. 
(b) [7] pis invertible. 


Moreover, when these equivalent conditions hold, 


[77] = (71a (11) 


Remark In 10, observe how the interior subscript 8" (the basis for the intermediate space V) seems to “cancel out,” 
leaving only the bases for the domain and image space of the composition as subscripts (Figure 8.4.5). This cancellation 


of interior subscripts suggests the following extension of Formula 10 to compositions of three linear transformations 
(Figure 8.4.6): 


[T30T207;] p°p=([7T3] ep (Taleel(Tilee (12) 


[T,° Vly: =[TyIp- g-lT yp 


Cancellation Cancellation 


Figure 8.4.5 


T, T; T; 


Basis B Basis B” Basis B’" Basis B' 


Figure 8.4.6 


The following example illustrates Theorem 8.4.1. 


EXAMPLE 6 Composition <4 


Let Tj: 1 — P3 be the linear transformation defined by 
Ty (p(x)) =xp(x) 
and let 73: P3 —+ Pz be the linear operator defined by 
T2(p(x)) = p(3x = 5) 
Then the composition (73 o 71): — P3 is given by 


(72071) (p(x)) = Ta(T1 (P(x) = Taxp(x)) = (3x — 5) p(3x — 5) 
Thus, if p(x) =cg + ¢,x, then 
(Tz0 Ty egteyx) = (3x =—5) (eg +0¢1(3x = 5)) 


=c0(3x- 5) -+e1@x—5)? Mt 


In this example, ?; plays the role of U in Theorem 8.4.1, and 4 plays the roles of both V and W; thus we can 
take 8’ = 8" in 10 so that the formula simplifies to 


[72071] p' a= ([TalelTile.e, (14) 


Let us choose = {1, x} to be the basis for P; and choose BY = {1, x, xt to be the basis for P. We 


showed in Examples | and 5 that 


0 0 1 =-5 25 
[Tileig=|1 0] and [Ta]pr=|0 3 -30 
0 1 Oo 60 9 
Thus, it follows from 14 that 
1-5 25)/0 0 =—5 25 
[T207i]g.p=|0 3 =—30]/1 OJ =| 3 —30 (15) 
Oo 60 9}/0 1 0 9 


Asa check, we will calculate [72 0 71] g’ p directly from Formula 4. Since B= {1, x}, it follows from 
Formula 4 with uj = 1 and u2 =< that 


[T2071] p1g= (1072971) (1) ]2|[(720 71) @) 1 2°] (16) 
Using 13 yields 


T7071 \{1\= 3x —5 and (7207) \(x) = (3x — 5)? = 9x7 — 30x + 25 
(72071 }{1} (72071) 


From this and the fact that 5 a {1, xix ay it follows that 


—5 25 
[(72071)(1)]g:=| 3] and [(7207))(x)] p'=| —30 
0 9 
Substituting in 16 yields 
=5 25 
[T207i]pizp=| 3 —30 
0 9 


which agrees with 15. 


Concept Review 
e Matrix for a linear transformation relative to bases 
e Matrix for a linear operator relative to a basis 


° The three-step procedure for finding T(x) 
Skills 


e Find the matrix for a linear transformation 7: ” —, }f relative to bases of V and W. 


¢ For a linear transformation 7: —, }¥ find F(x) using the matrix for T relative to bases of V and W. 


Exercise Set 8.4 


1. Let 7: P3 — Pz be the linear transformation defined by T(p(x)) = xp(x). 
(a) Find the matrix for T relative to the standard bases 
B= {u., u2, u3} and Bi = {v1 V2, V3, vat 
where 


uj=1, ug—2, u3=x" 


vj=1, vo=2%, vg=x4, v4=x° 


(b) Verify that the matrix [7'] p’ p obtained in part (a) satisfies Formula 5 for every vector x — co +eyx+ cox? in Py 


2. Let F: P3 — P; be the linear transformation defined by 
Tag fax ajx”) = (a0 { a1)- (2a; { 32) 


(a) Find the matrix for T relative to the standard bases 8 = { 1,x,% “ and BY = { 1,x \ for P3 and P}. 
(b) Verify that the matrix [7] g' p obtained in part (a) satisfies Formula 5 for every vector x = ¢g + yx + e9x7 in P2 
3. Let F: P3 — P be the linear operator defined by 
Tag + a1x + azx*} =ag+e@, (x _ 1) + az(x = 1)? 


(a) Find the matrix for T relative to the standard basis 8 = f1, x,X “ for P. 


(b) Verify that the matrix [7’] p obtained in part (a) satisfies Formula 8 for every vector ¥ — ay + ayx + agx? in P>. 


Answer: 


(a) }1 1 1 
0 1-2 
0 oa 1 


4, Let 7: R2 _, R2 be the linear operator defined by 


and let 8= {uy, uz} be the basis for which 


(a) Find [7] p. 
(b) Verify that Formula 8 holds for every vector x in 22. 


5. Let 7. R2 _, PR? be defined by 


(a) Find the matrix [7] p' p relative to the bases 8= {uj, ug} and 3B’ = {¥1, V2, v3h, where 


-f} [3 


(b) Verify that Formula 5 holds for every vector in 22. 


Answer: 
(a) oa) 
1 
=5 1 
g 4 
3 3 


6. Let 7:27 _, R? be the linear operator defined by 
P(x, %2,%3) = (41 = 22,42 = 21,41 = %3) 


(a) Find the matrix for T with respect to the basis 8 = {¥1, ¥3, ¥3} , where 
vy=(1,0,1), vwe=(0,1,1), w3=(1, 1,9) 


(b) Verify that Formula 8 holds for every vector x = (x1, x2, x3) in R?. 


c) Is T one-to-one? If so, find the matrix of 7—! with respect to the basis B. 
T p 


7. Let T: Pz —+ P3 be the linear operator defined by T(p(x))} = p(2x + 1), that is, 
T{eo + cx + c2x°| =ceg+e, (2x + 1) + ¢3(2x + 1)? 


(a) Find [T] p with respect to the basis 8 = f1, x, xt 


(b) Use the three-step procedure illustrated in Example 2 to compute? (2 — 3x + 4x A; 


(C) Check the result obtained in part (b) by computing T(2 — 3x +4x * directly. 


Answer: 

111 

024 

004 
(b) 3+ 10x + 16x? 

8. Let 7: P2 —+ P3 be the linear transformation defined by T(p(x}) = xp{x — 3), that is, 

Teo + C1x + c2x”| = x(eo + ey | = 3} +e3(x— 3)°) 
(a) Find [T] p' i B=!1,x,x7\ ana Bl = 31, x, x7, x7 
Find [7] g' g relative to the bases »%,X" > and ,X,X°, XB, 

(b) Use the three-step procedure illustrated in Example 2 to compute r(l x= xX *}; 
(©) Check the result obtained in part (b) by computing T(t +x—x | directly. 


2 Let ¥; = 3 | and v3 = ee and let 


k<3 
A= 
[-2 5| 
be the matrix for 7. 22 _, R? relative to the basis B= {v1, v2}. 


(a) Find [7(¥1)] gand [T(v2)] p 
(b) Find 7(¥,) and T(¥v3). 


(c) Find a formula for (Pal : 


(@) Use the formula obtained in (c) to compute 7|| } 


Answer: 


® erole=|_3), (7oale=|2| 
® ro =|_3]. ro =[75| 


29 
(c) 


10. 3 =2 10 
LetA=| 1 6 2 1] be the matrix for 7. p4_, R3 relative to the bases B= {¥1, v2, v3, ¥4} and 
—3 O7 1 
B= {w1, W?, w3}, where 
0 2 1 6 
o 1 ee 1 wie 4 ao 9 
1 =-1 -1/ 4 
1 =1 2 2 
0 -7 —6 
w,=|8 w2=| 8], wa=| 9 
8 1 1 


(a) Find [T(¥1)] p’, [Tv2)] pr, [T(w3)] pr. and [T(w4)] p°- 
(b) Find T(v1), T(v2), T(w3), and Piv4). 


(c) *1 
x2 
Find a formula for 7 
x3 
x4 
(d) 2 
Use the formula obtained in (c) to compute 7 : 
0 
11. 1 3 =1 
LetA=|]2 0 45] be the matrix for 7: P3— P3 with respect to the basis = {v1, v2, v3} , where 
6 —2 4 


vy = 3x + 3x4.¥9= —143x42x2,¥3=3 47x + 2x? 


Find [T(v1)] , [T(v2)] g, and [T(w3)] p. 
(a) Find T(¥1), T(w2), and T(w3). 
(b) Find a formula for T{ap + ajx + ax’), 


(C) Use the formula obtained in (c) to compute 7(l +x a 


Answer: 


(a) 1 3 —1 
[Twp] g=|2|. (Tovalg=| 0]. [7s] g=| 5 
6 =) 4 

(b) T(vy) = 16+ 51x + 19x7, Tlvo) = —6 —5x 45x", T(w3) =7+40x + 15x? 


(c) P(ap + a1x +a2x7| = Stan Mle + 28943 + sh 1a + 247a2 ape eee + 107a2 x2 


(d) rl + x”| = 22 + 56x + 14x? 


12. Let 7, :.P1 — P3 be the linear transformation defined by 
P1(PQ)) = zp@) 
and let 73:3 — Pz be the linear operator defined by 


13. 


14. 


15. 


16. 


17. 


18. 


19, 


P2(p(x)) = p(2x + 1) 
Let 8= {1,x} and Bis {1, x, x7 be the standard bases for Py and P. 
(a) Find [72071] 8p, [T2] gt and [71] Bp. 
(b) State a formula relating the matrices in part (a). 


(c) Verify that the matrices in part (a) satisfy the formula you stated in part(b). 


Let 71:1 — P3 be the linear transformation defined by 
Ty (eg + eyx) = 2eg — 3c4x 
and let 73:3 — Pz be the linear transformation defined by 


Taleo + ojx + cx”) = Bepx | 3c 1x7 1 3e9x? 
Let = {1, x} a" = {1, x, x7 and B" = {1,x, x7, xh. 


(a) Find [72 °o Ty ] BB, [73] BiB", and [Fy ] B"B. 
(b) State a formula relating the matrices in part (a). 


(c) Verify that the matrices in part (a) satisfy the formula you stated in part(b). 


Answer: 
© coral eral) wml? 2 
[T207\] pz 2= le [Talerg¢= , [Tile z= 0 —3 
0 9 03 0 4. 
0 60 0 0 3 


(b) [72°71] a p=([72lerer(Tilep 
Show that if 7: /” —, }¥ is the zero transformation, then the matrix for T with respect to any bases for V and Wis a zero 
matrix. 


Show that if 7: ” —, ¥ is a contraction or a dilation of V (Example 4) of Section 8.1), then the matrix for T relative to 
any basis for V is a positive scalar multiple of the identity matrix. 


Let 8= {¥1, V3, V3, V4} be a basis for a vector space V. Find the matrix with respect to B of the linear operator 
TV — ¥ defined by T(v1) = v2, T(v2) = v3, T(w3) = v4, T(w4) = VI. 

Prove that if B and 8° are the standard bases for 8” and R™, respectively, then the matrix for a linear transformation 
T:R" _, R™ relative to the bases B and 8" is the standard matrix for T. 


(Calculus required) Let D: Pz —+ P be the differentiation operator D (P) = p'(x) In parts (a) and (b), find the 
matrix of D relative to the basis 3= {py, pz, p3} . 

(a) pp=1, p2=x, pp=x" 

(b) py =2, pp =2— 3x, pp =2— 3x + 8x7 

(C) Use the matrix in part (a) to compute D(6 — 6x + 24x : ) 

(d) Repeat the directions for part (c) for the matrix in part (b). 


(Calculus required) In each part, suppose that 8 = {f1, f2, £3} is a basis for a subspace V of the vector space of 
real-valued functions defined on the real line. Find the matrix with respect to B for differentiation operator 7): — [. 
(a) £5 =1, fg=smx, f3=cosx 

(b) f; =1, f2=e", f3=e" 

(c) fy =e", fo =xe", f3=x7e 


20. 


21. 


(4) Use the matrix in part (c) to compute D{4e 4+ 6xe™ — 10x7e* | 


Answer: 
(a) |9 O 0 
00 —1 
0 1 0 
(b) |9 0 0 
010 
00 2 
(c) |2 1 0 
02 2 
00 2 
(d) 210 4 14 
14e7* — 8xe* — 20x7e7* since] 0 2 2 S| =| =8 
0 0 2]] —10 —20 


Let V be a four-dimensional vector space with basis B, let W be a seven-dimensional vector space with basis 3", and let 
T:-¥ — be a linear transformation. Identify the four vector spaces that contain the vectors at the corners of the 
accompanying diagram. 


Direct ; 
: » Tix) 
computation 


x 
(] | > ) 
Multiply by [7'],. 5 


ixly-—— 5) “dy 


Figure Ex-20 


In each part, fill in the missing part of the equation. 
(a) [T2071] p°2= [72] 2[Tile, 
(b) [7307207] ep p=([73] [Tala pelTilene 


Answer: 


(a) BR Bf 
(b) Bf Bu 


True-False Exercises 


In parts (a)—(e) determine whether the statement is true or false, and justify your answer. 


() If the matrix of a linear transformation 7:  —, }# relative to some bases of V and Wis F 3 then there is a nonzero 


vector x in V such that T(x) = 2x. 
Answer: 


False 


(b} If the matrix of a linear transformation 7: /” —, }¥ relative to bases for V and W is E 3 then there is a nonzero 
vector x in V such that T(x) = 4x. 
Answer: 
False 


() If the matrix of a linear transformation 7: ” —, } relative to certain bases for V and W is E a then T is one-to-one. 


3 


Answer: 


True 


(d) If §: * —. F and 7: —, ¥ are linear operators and B is a basis for V, then the matrix of § 9 F relative to B is 
[7] [5] p. 
Answer: 
False 

(e) If 7-7 —, is an invertible linear operator and B is a basis for V, then the matrix for 7—! relative to B is [7] ‘ 


Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


8.5 Similarity 


The matrix for a linear operator T; VV depends on the basis selected for V. One of the fundamental problems of linear 
algebra is to choose a basis for V that makes the matrix for 7 as simple as possible—a diagonal or a triangular matrix, for 
example. In this section we will study this problem. 


Simple Matrices for Linear Operators 


Standard bases do not necessarily produce the simplest matrices for linear operators. For example, consider the matrix 
operator 7: R2 _, R* whose standard matrix is 
1 1 
Ti= 


and view [7'] as the matrix for 7 relative to the standard basis B = {e;, e2} for R2. Let us compare this to the matrix for 
; _ pt io : : 
T relative to the basis 8 = {uj , u;} for R? in which 


1 1 
uy =|;} =|) (2) 


Since 
it follows that 


so the matrix for T relative to the basis 8" is 


[T]p= [7 (uj ) p. T (uy) 5] = E | 


This matrix, being diagonal, has a simpler form than [7] and conveys clearly that the operator T scales ui by a factor of 2 


and uw, by a factor of 3, information that is not immediately evident from [7']. 


One of the major themes in more advanced linear algebra courses is to determine the “simplest possible form” that can be 
obtained for the matrix of a linear operator by choosing the basis appropriately. Sometimes it is possible to obtain a 
diagonal matrix (as above, for example), whereas other times one must settle for a triangular matrix or some other form. 
We will only be able to touch on this important topic in this text. 


The problem of finding a basis that produces the simplest possible matrix for a linear operator 7: / —, can be attacked by 


first finding a matrix for T relative to any basis, typically a standard basis, where applicable, and then changing the basis in 
a way that simplifies the matrix. Before pursuing this idea, it will be helpful to revisit some concepts about changing bases. 


A New View of Transition Matrices 


Recall from Formulas 7 and 8 of Section 4.6 that if 8 = (uy, uz, ..., u,} and B= fu , u), 4 Uh } are bases for a vector 


space V, then the transition matrices from B to B' and from 8" to B are 


Pp_.p'= [fur] g'|[ua] 2---[[un) 2") (3) 


Pe'_e= (fm Jal] stl Je! @ 


where the matrices Pp_. pt and Pp'_, p are inverses of each other. We also showed in Formulas 9 and 10 of that section 
that if y is any vector in V, then 


Pe p'(v]e=[v] zB (5) 


Pp ealv]e'=(vlz (6) 


The following theorem shows that transition matrices in Formulas 3 and 4 can be viewed as matrices for identity operators. 


THEOREM 8.5.1 


If B and 8" are bases for a finite-dimensional vector space V, and if 7-7 —, f is the identity operator on V, then 


Pp pt = [flere and Ppt_.p= [flee 


Proof Suppose that B= {uyz, u3,..., u,} and B= fu , u), a Uy } are bases for V. Using the fact that /(v) = v for all 


y in V, it follows from Formula 4 of Section 8.4 that 


(Jee = =([L[u)) 2"|[/u2)] a"}-|2n) ) 2!) 
= [[u1] 2"|[u2] 2'-|[un) 3") 
=Pp_.p' [Formula (3) above] 
The proof that [/] p pt = Pg'_,pis similar. 


Effect of Changing Bases on Matrices of Linear Operators 


We are now ready to consider the main problem in this section. 


PROBLEM 
If B and 8" are two bases for a finite-dimensional vector space V, and if 7: }” —, /” is a linear operator, what 


relationship, if any, exists between the matrices [7'] p and [7] pr? 


The answer to this question can be obtained by considering the composition of the three linear operators on V pictured in 
Figure 8.5.1. 


I T I 


v v Ty) Tiy) 
V Vv V V 
Basis = B’ Basis = B Basis = B Basis = B' 


Figure 8.5.1 


In this figure, y is first mapped into itself by the identity operator, then y is mapped into 7(¥v) by T, and then T'(v) is 
mapped into itself by the identity operator. All four vector spaces involved in the composition are the same (namely, V), but 
the bases for the spaces vary. Since the starting vector is y and the final vector is 7'(w), the composition produces the same 
result as applying T directly; that is, 
T=loTol (7) 
If, as illustrated in Figure 8.5.1, if the first and last vector spaces are assigned the basis 8" and the middle two spaces are 
assigned the basis B, then it follows from 7 and Formula 12 of Section 8.4 (with an appropriate adjustment to the names of 
the bases) that 
[Flee =[LoTol]p p= [2] ge alTlaal/lee (8) 
or, in simpler notation, 
[7] p= [4] 2 el7]al/) ee" (9) 
We can simplify this formula even further by using Theorem 8.5.1 to rewrite it as 


[7] g'=P3_.p[T] ppp (10) 


In summary, we have the following theorem. 


THEOREM 8.5.2 


Let 7: —, ¥ be a linear operator on a finite-dimensional vector space V, and let B and 8" be bases for V. Then 
[T]g =P" [7] gP (11) 


where P= Pp! _.p and p—! — Pp pt 


Warning When applying Theorem 8.5.2, it is easy to forget whether P = Ppr_, p (correct) or P= Pp_. ps (incorrect). It 
may help to use the diagram in Figure 8.5.2 and observe that the exterior subscripts of the transition matrices match the 
subscript of the matrix they enclose. 


IT]y:= Pg .p:IT]g P55 


Exterior subscripts 


Figure 8.5.2 


In the terminology of Definition 1 of Section 5.2, Theorem 8.5.2 tells us that matrices representing the same linear operator 
relative to different bases must be similar. The following theorem is a rephrasing of Theorem 8.5.2 in the language of 
similarity. 


THEOREM 8.5.3 


Two matrices, A and B, are similar if and only if they represent the same linear operator. Moreover, if 8 = P “lap 


then P is the transition matrix from the basis relative to matrix B to the basis relative to matrix A. 


EXAMPLE 1 Similar Matrices Represent the Same Linear Operator 


We showed at the beginning of this section that the matrices 


represent the same linear operator 7: 22 _, 2. Verify that these matrices are similar by finding a matrix P for 


which 9 = Pep. 


Solution We need to find the transition matrix 
P=Ps'_e=[[w ]q|[t2]5] 
where 8’ = fui, u,} is the basis for R* given by 2 and 8 = {e1, e3} is the standard basis for R2. We see by 
inspection that 
u; =e; +e2 
u) =e; + 2e3 


from which it follows that 
7, _|1 15 _|]1 
Je=[;] =¢ []5=|2| 
Thus, 


P=Ppt_p= [Tu]. 


Pel=|1 3| 


We leave it for you to verify that 


and hence that 


sie [a a} Lead [na 


Similarity Invariants 


Recall from Section 5.2 that a property of a square matrix is called a similarity invariant if that property is shared by all 
similar matrices. In Table | of that section (table reproduced below), we listed the most important similarity invariants. 
Since we know from Theorem 8.5.3 that two matrices are similar if and only if they represent the same linear operator 
T:/’ —.¥, it follows that if B and 8" are bases for V, then every similarity invariant property of [7] 7 is also a similarity 
invariant property of [7] p for any other basis BY for V. For example, for any two bases B and 8’ we must have 

det( [7] g) = det([7] p") 
It follows from this equation that the value of the determinant depends on 7, but not on the particular basis that is used to 


obtain the matrix for 7. Thus, the determinant can be regarded as a property of the linear operator T; indeed, if V is a finite- 
dimensional vector space, then we can define the determinant of the linear operator T to be 


det(t) = det( [7] p) (12) 


where B is any basis for V. 


Table 1 Similarity Invariants 


Property Description 

Determinant A and p—! 4p have the same determinant. 

Invertibility A is invertible if and only if P—!_4P is invertible. 

Rank Aand p—! 4p have the same rank. 

Nullity Aand p—! 4p have the same nullity. 

Trace Aand p—! 4p have the same trace. 

Characteristic Aand p—! 4p have the same characteristic polynomial. 

polynomial 

Eigenvalues Aand p—! 4p have the same eigenvalues. 

Eigenspace If \ is an eigenvalue of A and P—!_4p, then the eigenspace of A corresponding to \, and the 
dimension eigenspace of P—!_4P corresponding to \ have the same dimension. 


EXAMPLE 2 Determinant of aLinear Operator << 


At the beginning of this section we showed that the matrices 


Led o-eg 


represent the same linear operator relative to different bases, the first relative to the standard basis B= {e1, e3} 
: . pl foe . 
for R? and the second relative to the basis 8° = { uy, u;} for which 


[pf 


This means that [7] and [7] p' must be similar matrices and hence must have the same similarity invariant 
properties. In particular, they must have the same determinant. We leave it for you to verify that 


i] 2 0 
cel |= ee 0 3 =6 


a 6 and det[T] p= 


EXAMPLE 3 Eigenvalues and Bases for Eigenspaces 
Find the eigenvalues and bases for the eigenspaces of the linear operator 7: P3 — P defined by 


T{a+ bx cx’) = sn atel (a + 2b be} 4 (a+ 3e x” 


Solution We leave it for you to show that the matrix for T with respect to the standard basis 
2 
B= { 1 »,Xi,X \ 1S 


00 -2 
[Tlp=]12 1 
10 3 


The eigenvalues of T are } — | and \ — 2 (Example 7 of Section 5.1). Also from that example, the 
eigenspace of [7] p corresponding to \ = 2 has the basis {uy, uz} , where 


=-1 0 
uj=| Of, uzg=]1 
1 0 
and the eigenspace of [ 7] » corresponding to \ — 1] has the basis {u3} , where 
=2 
u3 = 1 
1 


The matrices Uj, U2, and U3 are the coordinate matrices relative to B of 
pPi=-14 rae p2=%x, p3=—-2+x4 x? 


Thus, the eigenspace of T corresponding to  — 2 has the basis 


{pi.p2}={-1 t x?,x\ 


and that corresponding to \, — ] has the basis 


{psb={-24x4271 


As a check, you can use the given formula for T to verify that 


T(p1)=2p1, T(p2)=2p2, and 7T(p3)=p3 


Concept Review 

e Similarity of matrices representing a linear operator 
e Similarity invariant 

e Determinant of a linear operator 

Skills 


e Show that two matrices A and B represent the same linear operator, and find a transition matrix P so that 
—l 
B=P AP. 


e Find the eigenvalues and bases for the eigenspaces of a linear operator on a finite-dimensional vector space. 


Exercise Set 8.5 


In Exercises 1—7, find the matrix for T relative to the basis B, and use Theorem 8.5.2 to compute the matrix for T relative 


to the basis 3". 


1. 7: R2 _, R? is defined by 
“(a = * _ | 


Answer: 
a) oe 

1 =—2 11 11 

11 11 


2. FR? _, p? is defined by 


and B= {u, uz} and 3! = {¥1, ¥2}, where 
eel weH)!. Se Welt al 
fap oF pap [sp "4 La 


3. F- R2 _, PR? is the rotation about the origin through an angle of 45°; B and 3’ are the bases in Exercise 1. 


Answer: 
ee 13. __ 25 
nl V2)» \iy211y2 
y2 2 y2  11f'2 
4. 7. p3 _, p3 is defined by 
X41 x1 + 2x2—%3 
T\|x%2/)/= X32 
x3 x1 + 7x3 


and B is the standard basis for R? and B! = {¥1, V2, v3}, where 
1 1 
vy=|O0}, vo=]1], 
0 0 


5. T: RP? —, R? is the orthogonal projection on the xy-plane, and B and 3" are as in Exercise 4. 


Answer: 
10 0 10 0 
[T]p=]0 1 0}, [F]pr=]0 1 1 
00 0 00 0 
6. 7: R2 _, R2 is defined by T(x) = 5x, and B and 8" are the bases in Exercise 2. 


7. T:P — P, is defined by T(ag + ax) =ap + a(x +1), and B= (py, pz} and 8’ = fai. q2 |, where py =6+ 3x 
» pz = 10+ 2x, qi; = 2, q3 = 3 + 2x. 


Answer: 
2 2 
3 9 Ll 
2 3 

8. Find det(é). 


(a) TRA _, R4, where 7'(x1, x2) = (3x1 —4x3, —x1 + 7x3) 
(b) 7:R3 —, R4, where T(x1, x2, x3) = (x1 — 2X2, 42 —%3, 23-21) 
(c) T:P + Pa, where T(p(x)) = p(x = 1) 


9. Prove that the following are similarity invariants: 
(a) rank 
(b) nullity 
(c) invertibility 


10. Let 7: P4—+ P4 be the linear operator given by the formula T(p(x)}) = p(2x + 1). 
(a) Find a matrix for T relative to some convenient basis, and then use it to find the rank and nullity of 7. 


(b) Use the result in part (a) to determine whether T is one-to-one. 


11. In each part, find a basis for 2? relative to which the matrix for T is diagonal. 
(a) r x1]\ X{—%X2 
x2|} | 2xy +4x 
(b) pf[71]\ _ 4x1 —%x2 
x2]f =—3x1 + x2 
Answer: 


© {1} [3] 


(b) —3-/21 —3+ /21 
= 6 6 


12. In each part, find a basis for 7 relative to which the matrix for T is diagonal. 


(a) xy —2x;+ x%2- x3 


TT} %2])= X,—2x2— 1X3 
aa =x, — x%2—2x3 
(b) x4 —X%2 + X3 
T\| x2) [=] —%1 x3 
x3 x1 +22 
(c) x4 4x, 4x3 
T}| %2 | |= | 2xy + 3x9 + 2x3 
x3 


x1 +4x3 


13. Let 7: Pz — P3 be defined by 
Tay + ax + azx*} = (5ag + 6a, + 2a) 


_ (a: | Baa jx + (a0 = 2a2}x 


(a) Find the eigenvalues of T. 
(b) Find bases for the eigenspaces of T. 


Answer: 


(a) A= —4, A=3 
(b) Basis for eigenspace corresponding to \= —4: —24 ox -+- x; basis for eigenspace corresponding to 
A=3:5—2x 4x7 


14, Let 7: M3 — M9 be defined by 


(a) Find the eigenvalues of T. 
(b) Find bases for the eigenspaces of T. 


15. Let \ be an eigenvalue of a linear operator 7: /” —, }’. Prove that the eigenvectors of T corresponding to , are the 
nonzero vectors in the kernel of \f — 7. 


16. (a) Prove that if A and B are similar matrices, then 42 and B? are also similar. More generally, prove that 4* and p* 
are similar if k is any positive integer. 


(b) If 42 and p? are similar, must A and B be similar? Explain. 


17. Let C and D be jz x 2 matrices, and let 3= {v, v3, ..., ¥,} bea basis for a vector space V. Show that if 
C[x] p= B[x] p for all x in V, then ¢ = p. 


18. Find two nonzero 2 x 2 matrices that are not similar, and explain why they are not. 


19. Complete the proof below by justifying each step. 
Hypothesis: A and B are similar matrices. 
Conclusion: A and B have the same characteristic polynomial. 


Proof: 
1. det (w in B) = det (w =P “1 AP| 


6, = det(Al — A) 


20. If A and B are similar matrices, say 8 — P—!,4P, then it follows from Exercise 19 that A and B have the same 


eigenvalues. Suppose that is one of the common eigenvalues and x is a corresponding eigenvector of A. See if you can 
find an eigenvector of B corresponding to 4 (expressed in terms of A, x, and P). 


2 


— 


. Since the standard basis for 2” is so simple, why would one want to represent a linear operator on ®” in another basis? 
Answer: 


The choice of an appropriate basis can yield a better understanding of the linear operator. 


22. Prove that trace is a similarity invariant. 
True-False Exercises 
In parts (a)—(h) determine whether the statement is true or false, and justify your answer. 
(a) A matrix cannot be similar to itself. 
Answer: 


False 


(b) If A is similar to B, and B is similar to C, then A is similar to C. 
Answer: 


True 


(c) If A and B are similar and B is singular, then A is singular. 
Answer: 


True 


(d) If A and B are invertible and similar, then 4 —l and B —l are similar. 


Answer: 


True 
(e) If 7,2" — R” and 73: R” — R” are linear operators, and if [71] g',g = [72] gp with respect to two bases B and 3’ 
for R", then 71 (x) = 73(x) for every vector x in R” | 


Answer: 


True 


( If 7; -R” —, R” isa linear operator, and if [71] »= [71] g with respect to two bases B and B' for R”, then B= 8". 


Answer: 


False 


(g) If 7:2" — R” is a linear operator, and if [J] p = J, with respect to some basis B for R”, then T is the identity operator 
on R”. 


Answer: 


True 


(h) If 7:2" —, R” is a linear operator, and if [7] g' g =/» with respect to two bases B and 8" for R”, then T is the identity 
operator on R”. 


Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


_ Ww 


6. 


Chapter 8 Supplementary Exercises 


. Let A be an » x » matrix, B a nonzero » x% » matrix, and x a vector in 8” expressed in matrix notation. Is 


T(x) = Ax + #a linear operator on 2”? Justify your answer. 


Answer: 


No. T(x, + X2) = A(x] + x2) +3 (Ax, + B) + (Axg + 3B) = T(x1) + T (x9), and if = 1, then 
T(cx) =cAx+ B#c(Ax+ 8) =cT (x). 


. Let 


@ —=—siné 
Aa| °° 
sn cos 5 


(a) Show that 


20 —sin20 3 [cos3@ —sin39 
At= cos 4d A?= 
ee cos29| ™ sin30 cos30 


(b) Based on your answer to part (a), make a guess at the form of the matrix A” for any positive integer n. 


(c) By considering the geometric effect of multiplication by A, obtain the result in part (b) geometrically. 


. Let 7: ” —, ¥ be defined by 7'(¥) = ||v||v. Show that 7 is not a linear operator on V. 


. Let ¥4, V3, -.., Vy be fixed vectors in R”, and let 7:2" —. R™ be the function defined by 


T(x) = (x- v1, X-¥9, .... X- Vj), where X - Vj is the Euclidean inner product on 2”. 
(a) Show that T is a linear transformation. 


(b) Show that the matrix with row vectors vj, V2, ..., Vy 1S the standard matrix for T. 


. Let {e1, @2, @3, e4} be the standard basis for 24, and let 7-24 _, 23 be the linear transformation for 


which 
T(ey) =, 2,1), lez) = (0, 1, 9), 
T(e3) = (1, 3,0), Tleqy= (1, 1,1) 


(a) Find bases for the range and kernel of T. 
(b) Find the rank and nullity of T. 


Answer: 


(a) T{e3) and any two of T(e;), Tez), and 7(e4) form bases for the range; ( — 1, 1, 0, 1) is a basis 
for the kernel. 
(b) Rank = 3, nulkty = 1 


Suppose that vectors in 22 are denoted by ] x 3 matrices, and define 7. R37 _, R3 by 


aft 24 
T([x1 x2 x3))=[%1 x2 x3]} 3 0 1 
22 5 


(a) Find a basis for the kernel of T. 
(b) Find a basis for the range of T. 
7. Let 8= {v 1, v2, ¥3, v4} be a basis for a vector space V, and let 7’: /” —, f” be the linear operator for 

which 

T(vy)) =Vvy + ¥2+ 43+ 3¥4 

T(v2)) =vy —v2+ 2v34+ 2vy4 

T(w3)) = 2v, —4w2 + 5v3 + 3vq4 

T(v4) 


= 2v1 + 6v2 — 6v3— 2vy4 


(a) Find the rank and nullity of T. 


(b) Determine whether T is one-to-one. 


Answer: 


(a) Rank(¢) = 2 and nulhty(¢) = 2 
(b) Tis not one-to-one. 


8. Let V and W be vector spaces, let T, 7’, and 73 be linear transformations from V to W, and let k be a scalar. 
Define new transformations, 7, 4+- 73 and x7, by the formulas 


(Py + 72) (x) = T(x) + T2(x) 
(KT) (x) =&(T(x)) 
(a) Show that (7; + 73):/ — W and kv: — }f are both linear transformations. 


(b) Show that the set of all linear transformations from V to W with the operations in part (a) is a vector 
space. 


9. Let A and B be similar matrices. Prove: 
(a) 47 and B? are similar. 
(b) If A and B are invertible, then 4~! and 2 —! are similar. 


10. Fredholm Alternative Theorem Let 7: }* —, }* be a linear operator on an n-dimensional vector space. 
Prove that exactly one of the following statements holds: 


(i) The equation 7x) =b has a solution for all vectors b in V. 
(ii) Nullity of 7 = 0. 
11. Let 7: Mf39 —+ My be the linear operator defined by 


re) =| | x} i 


Find the rank and nullity of T. 
Answer: 


Rank = 3, nullity = 1 


12. Prove: If A and B are similar matrices, and if B and C are also similar matrices, then A and C are similar 
matrices. 


13. 


14. 


15. 


16. 


17. 


Let £: 3433 —» Mz be the linear operator that is defined by 4 (me =M F Find the matrix for L with 


respect to the standard basis for Af 39. 


Answer: 
Lo) og 
ee Oe 
0100 
000 1 
Let B= {uy, uz, uz} and B= {¥1, V2, v3} be bases for a vector space V, and let 
2 <1 3 
P=/1 1 4 
a ae 


be the transition matrix from 8" to B. 
(a) Express ¥j, ¥2, V3 as linear combinations of Uj, U2, U3. 


(b) Express Uj, U2, U3 as linear combinations of ¥1, ¥2, ¥3. 


Let 3= {uy, uz, uz} be a basis for a vector space V, and let 7: ” —, 7 be a linear operator for which 
—3 4 7 
[T]p=| 10 —2 
01 O 


Find [7] p’, where 8 ‘= {v1 V2, v3} is the basis for V defined by 


Vj~H=uy, We=uUyru, WSU rug+u3 


Answer: 
—4 0 9 
[T] z= 10 —2 
0 1 
Show that the matrices 


are similar but that 


are not. 
Suppose that 7: }’ —, }” is a linear operator, and B is a basis for V for which 
X{ —%X94+%3 x4 
[T@&)]g=| *2 if [x]g=|%2 
X1—%X3 x3 


Find [7] p. 


18. 
19. 


20. 


21. 


Answer: 


{at 4 
[Tlp=|0 1 0 
P Gad 


Let 7: * —, ¥ bea linear operator. Prove that T is one-to-one if and only if det(#} # 0. 
(Calculus required) 


(@) Show that iff = # (x)is twice differentiable, then the function 2: o ( — co, 00) — i ( — co, 00] 
defined by 2 (fF) = f (x) is a linear transformation. 


(b) Find a basis for the kernel of D. 
(c) Show that the set of functions satisfying the equation D(f)}) = f (x) is a two-dimensional subspace of 
ae ( — CO, 00), and find a basis for this subspace. 


Answer: 


(b) f(x) =2, g(x) =1 
(c) f(x) =e", g(x) =e™ 


Let 7: P, R? be the function defined by the formula 
p(=1) 
T(p(x)) =| 700) 
pt) 


(a) Find T(x 5x4 6}, 


(b) Show that T is a linear transformation. 


(c) Show that T is one-to-one. 


(4) Find T~* (0, 3, 0}. 
(e) Sketch the graph of the polynomial in part (d). 


Let x1, x3, and x3 be distinct real numbers such that 
xX, RxXQ=axZ 
and let 7: P, _, R? be the function defined by the formula 
p(*1) 
T(p(x)) =| pra) 
p(x3) 


(a) Show that T is a linear transformation. 
(b) Show that T is one-to-one. 


(c) Verify that if @1, @2, and @3 are any real numbers, then 
ay 
T|| a2 | |=a,P1 (x) +a2Po(x) +.a3P3(x) 
az 
where 
Pa (x) = 4 = %2) % = 23) 
1%) (x1 —%2)(x1 — 3) 


_— x) % — 4X3) 
Pala) = (x2 — x1) (x2— 3) 


_ x) (% — 42) 
P30) = (x3 — x1) (x3 — 22) 


(d) What relationship exists between the graph of the function 
a1 Py (x) +a2P2(x) +43P3(x) 
and the points (x4, @1), (x2, a2), and (x3, a3)? 


Answer: 


(b) The points are on the graph. 


22. (Calculus required) Let p(x} and g(x} be continuous functions, and let V be the subspace of 
Coco, +00) consisting of all twice differentiable functions. Define 7: }* —. 7 by 


L(y(x)\=y" (x) + p@)y"(x) +a) y@) 
(a) Show that L is a linear transformation. 
(b) Consider the special case where p(x} = 0 and g(x} = 1. Show that the function 
o(x) =cysm x + ¢32C08 x 
is in the kernel of L for all real values of 1 and ¢3. 


23. Calculus required Let D: P,, —+ Py be the differentiation operator D (P) =p’. Show that the matrix for D 


: P 2 : 
relative to the basis 3? = {1, oe eee x" is 


Oe: eS eee 
OO .2°0 sO 
eh: (OD case, A 
OO. OD oe 
0 OB Dew. OD 
24. Calculus required \t can be shown that for any real number c, the vectors 
(x=c)? (x=c)” 
uci 2! F x! 


form a basis for ?,,. Find the matrix for the differentiation operator of Exercise 23 with respect to this 
basis. 


25. Calculus required J: Py, — Py, be the integration transformation defined by 


x 
J(p) -[ tare ++an" a 
0 
=apx + cal +...+ Paris oi 
where p = ag + @jx +... -+ &,x”. Find the matrix for J with respect to the standard bases for P,, and 
Prt 


Answer: 
0.0 0 0 
10 0 0 

BS 
oe ee 0 

a: 
Os 0 
00 0 1 
nel 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


| CHAPTER 


Numerical Methods 


CHAPTER CONTENTS 


9.1. LU-Decompositions 

9.2. The Power Method 

9.3. Internet Search Engines 

9.4. Comparison of Procedures for Solving Linear Systems 
9.5. Singular Value Decomposition 


9.6. Data Compression Using Singular Value Decomposition 


INTRODUCTION 


This chapter is concerned with “numerical methods” of linear algebra, an area of study 
that encompasses techniques for solving large-scale linear systems and for finding 
numerical approximations of various kinds. It is not our objective to discuss algorithms 
and technical issues in fine detail, since there are many excellent books on the subject. 
Rather, we will be concerned with introducing some of the basic ideas and exploring 
important contemporary applications that rely heavily on numerical ideas—singular value 
decomposition and data compression. A computing utility such as MATLAB, 
Mathematica, or Maple is recommended for Section 9.2 to Section 9.6 . 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


9.1 LU-Decompositions 


Up to now, we have focused on two methods for solving linear systems, Gaussian elimination (reduction to row 
echelon form) and Gauss—Jordan elimination (reduction to reduced row echelon form). While these methods are 
fine for the small-scale problems in this text, they are not suitable for large-scale problems in which computer 
roundoff error, memory usage, and speed are concerns. In this section we will discuss a method for solving a linear 
system of equations in m unknowns that is based on factoring its coefficient matrix into a product of lower and 
upper triangular matrices. This method, called “ZU-decomposition,” is the basis for many computer algorithms in 
common use. 


Solving Linear Systems by Factoring 


Our first goal in this section is to show how to solve a linear system 4x — h of n equations in n unknowns by 
factoring the coefficient matrix A into a product 


A=LU (1) 


where L is lower triangular and U is upper triangular. Once we understand how to do this, we will discuss how to 
obtain the factorization itself. 


Assuming that we have somehow obtained the factorization in 1, the linear system 4x — h can be solved by the 
following procedure, called LU-decomposition. 


The Method of LU-Decomposition 
Step 1. Rewrite the system 4x — h as 


LUx=b (2) 


Step 2. Define a new » x | matrix y by 


tx=y (3) 


Step 3. Use 3 to rewrite 2 as iy = hb and solve this system for y. 
Step 4. Substitute y in 3 and solve for x. 


This procedure, which is illustrated in Figure 9.1.1, replaces the single linear system 4x — h by a pair of linear 
systems 

tx=y 

Ly=hb 
that must be solved in succession. However, since each of these systems has a triangular coefficient matrix, it 
generally turns out to involve no more computation to solve the two systems than to solve the original system 


directly. 


Solve Ax = b 


Figure 9.1.1 


EXAMPLE 1 Solving Ax=b by LU-Decomposition << 


Later in this section we will derive the factorization 
2 6 2 Pe 0 
—3 =—§ 0 1 
4 9 2 4 —3 
A = i 
Use this result to solve the linear system 


2 62 
—3 =—8 O| = /|%2 
2 


I 

I 

WJ 
oo — 
— WW — 


(4) 


aS Dew 


M 
Ls 
eT Wh Bo 


From 4 we can rewrite this system as 


(5) 


M 
Ls 
rT Wh bh 


Historical Note In 1979 an important library of machine-independent linear algebra 
programs called LINPACK was developed at Argonne National Laboratories. Many of the 
programs in that library use the decomposition methods that we will study in this section. 
Variations of the LINPACK routines are used in many computer programs, including 
MATLAB, Mathematica, and Maple. 


As specified in Step 2 above, let us define 1, »3, and y3 by the equation 
5 ee a | x1 v1 
0 3 X2]| = |¥2 
0 0 1] [73 ¥3 (6) 


Goon 


x a ¥ 


which allows us to rewrite 5 as 


2 0 0 1 2 
-—3 10 Y2)| = |2 
4 -3 7] |¥3 3 (7) 
is ¥ = »b 
or equivalently as 
2y1 =2 
=i ¥7 =2 


4y1 — 3y2+ 13 =3 
This system can be solved by a procedure that is similar to back substitution, except that we solve the 
equations from the top down instead of from the bottom up. This procedure, called forward 
substitution, yields 
yi=1, y2=5, y3=2 
(verify). As indicated in Step 4 above, we substitute these values into 6, which yields the linear 
system 


13 1//41 1 
0 1 3/)4%2)/=]5 
00 143 2 


or, equivalently, 
x1, +3x94+ x3=1 
x24 3x3=5 
x3=2 
Solving this system by back substitution yields 
xy=2, x9= =—1, x3=2 
(verify). 


Alan Mathison Turing (1912-1954) 


Historical Note Although the ideas were known earlier, credit for popularizing the matrix 
formulation of the LU-decomposition is often given to the British mathematician Alan 
Turing for his work on the subject in 1948. Turing, one of the great geniuses of the twentieth 
century, is the founder of the field of artificial intelligence. Among his many 
accomplishments in that field, he developed the concept of an internally programmed 
computer before the practical technology had reached the point where the construction of 


such a machine was possible. During World War II Turing was secretly recruited by the 
British government's Code and Cypher School at Bletchley Park to help break the Nazi 
Enigma codes; it was Turing's statistical approach that provided the breakthrough. In addition 
to being a brilliant mathematician, Turing was a world-class runner who competed 
successfully with Olympic-level competition. Sadly, Turing, a homosexual, was tried and 
convicted of “gross indecency” in 1952, in violation of the then-existing British statutes. 
Depressed, he committed suicide at age 41 by eating an apple laced with cyanide. 

[Image: Time & Life Pictures/Getty Images, Inc.] 


Finding LU-Decompositions 


Example | makes it clear that after A is factored into lower and upper triangular matrices, the system Ax = h can 
be solved by one forward substitution and one back substitution. We will now show how to obtain such 
factorizations. We begin with some terminology. 


DEFINITION 1 


A factorization of a square matrix A as 4 — 7,7/, where L is lower triangular and U is upper triangular is 
called an LU-decomposition (or LU-factorization) of A. 


Not every square matrix has an LU-decomposition. However, we will see that if it is possible to reduce a square 
matrix A to row echelon form by Gaussian elimination without performing any row interchanges, then A will have 
an LU-decomposition, though it may not be unique. To see why this is so, assume that A has been reduced to a row 
echelon form U using a sequence of row operations that does not include row interchanges. We know from 
Theorem 1.5.1 that these operations can be accomplished by multiplying A on the left by an appropriate sequence 
of elementary matrices; that is, there exist elementary matrices #1, #3, ..., #;, such that 


Bp: + + 82h,A=U (8) 


Since elementary matrices are invertible, we can solve 8 for A as 
—~popl... pl 
A=£, £5 #, U 


or more briefly as 
A=LU (9) 
where 


—l pl —1 


We now have all of the ingredients to prove the following result. 


THEOREM 9.1.1 


If A is a square matrix that can be reduced to a row echelon form U by Gaussian elimination without row 
interchanges, then A can be factored as 4 — 7,7, where L is a lower triangular matrix. 


Proof Let L and U be the matrices in Formulas 10 and 8, respectively. The matrix U is upper triangular because it 
is a row echelon form of a square matrix (so all entries below its main diagonal are zero). To prove that L is lower 
triangular, it suffices to prove that each factor on the right side of 10 is lower triangular, since Theorem 1.7.15 will 
then imply that L itself is lower triangular. Since row interchanges are excluded, each # j Tesults either by adding a 
scalar multiple of one row of an identity matrix to a row below or by multiplying one row of an identity matrix by a 
nonzero scalar. In either case, the resulting matrix # ; 1s lower triangular and hence so is ES by Theorem 1.7.1d. 


This completes the proof. 


EXAMPLE 2 AnLU-Decomposition << 


Find an LU-decomposition of 


2 bo2 
A=|=-3 -8 0 
4 $92 


Solution To obtain an LU-decomposition, 4 — 7,7/, we will reduce A to a row echelon form U using Gaus: 
elimination and then calculate ZL from 10. The steps are as follows: 


Elementary Matrix 


Reduction to Corresponding to Inverse of the 
Row Echelon Form Row Operation the Row Operation Elementary Matrix 
2 6 2 
-3 8 0 
4 9 2 
7 00 20 0 
7x = 
Step 1 ail Ej=|0 1 0| &'=]0 1 0 
0 0 1 0 0 1 


WW 
— 
WW 


100 100 
Step 2 (3xrow l)+row2 = E,= | 3 0 E;'=| 3 1 0 
001 001 


1 3 1 
0 1 3 
4 9 2 
100 1 00 
Step 3 (-4xrowl)+row3 E;=| 0 1 O| B'=/]0 1 0 
401 401 
1 3 1 
0 1 3 
0 3 -2 
100 1 Oo ¢ 
Step 4 (3xrow2)+row3 E,=|0 1 O E,'=|0 1. ¢ 
03 1 0 3 
1 3 1 
o tt 3 
0 oOo 7 
100 100 
Step 5 +x row 3 E;=|0 1 O| £&;'=|]0 1 0 
oo 3% 007 
1 3 1 
0 1 3f=U 
o o 1 


and, from 10, 


20 0]f 10 o]f1 0 olf1 oO olf1 0 0 
£ = |0 1 of/-3 1 offo 1 offo 1 o0f/0 10 
001] 00 1}/4 0 1J//o -3 1]fo 07 
2 00 
= |-3 10 
rr ae 
SO 
2 62 > (91 3 4 
—3 -8 o/=|-3 1 of]/0 1 3 
4 92 4 37/0 01 


is an LU-decomposition of A. 


Bookkeeping 


As Example 2 shows, most of the work in constructing an LU-decomposition is expended in calculating L. 
However, all this work can be eliminated by some careful bookkeeping of the operations used to reduce A to U. 


Because we are assuming that no row interchanges are required to reduce A to U, there are only two types of 
operations involved—amultiplying a row by a nonzero constant, and adding a scalar multiple of one row to another. 
The first operation is used to introduce the leading 1's and the second to introduce zeros below the leading 1's. 


In Example 2, a multiplier of $ was needed in Step | to introduce a leading | in the first row, and a multiplier of ; 


was needed in Step 5 to introduce a leading | in the third row. No actual multiplier was required to introduce a 
leading 1 in the second row because it was already a | at the end of Step 2, but for convenience let us say that the 
multiplier was 1. Comparing these multipliers with the successive diagonal entries of L, we see that these diagonal 
entries are precisely the reciprocals of the multipliers used to construct U: 


2 00 
L=|-3 10 (11) 
4 37 


Also observe in Example 2 that to introduce zeros below the leading | in the first row, we used the operations 
add 3 times the first row to the second 
add—4 times the first row to the third 
and to introduce the zero below the leading 1 in the second row, we used the operation 
add 3 times the second row to the third 
Now note in 12 that in each position below the main diagonal of L, the entry is the negative of the multiplier in the 
operation that introduced the zero in that position in U: 


2 00 
L=|-3 10 (12) 
4-9-7 


This suggests the following procedure for constructing an LU-decomposition of a square matrix A, assuming that 
this matrix can be reduced to row echelon form without row interchanges. 


Procedure for Constructing an LU-Decomposition 


Step 1. Reduce A to a row echelon form U by Gaussian elimination without row interchanges, keeping 
track of the multipliers used to introduce the leading 1's and the multipliers used to introduce the 
zeros below the leading I's. 


Step 2. In each position along the main diagonal of L, place the reciprocal of the multiplier that introduced 
the leading 1 in that position in U. 


Step 3. In each position below the main diagonal of L, place the negative of the multiplier used to 
introduce the zero in that position in U. 


Step 4. Form the decomposition 4 — 7.7/. 


EXAMPLE 3 Constructing an LU-Decomposition << 


Find an LU-decomposition of 


6 =—2 
A=/9 -1 1 
3 


Solution We will reduce A to a row echelon form U and at each step we will fill in an entry of Z in 
accordance with the four-step procedure above. 


6 —2 0 e 0 0 
A=|9 -l l ee 0 
3 7 5 eee 
a _ i ie = Ll 
© -; 0 multiplier = + 600 
9 -!I l e « 0 
3 7 5 eee 
1 -i 0 6 0 0 
@ 2 1 | — multiplier = —9 9 e 0 
©) 8 §{|<— multiplier = —3 3 ee 
_i 
a 6 0 0 
0 Gd) 4$|—multiplier = + 9 2 0 
0 8 5 3 0 @ 
1 -4+ 0O : 
3 6 0 0 
0 Jt ¢ 9 2 0 
0 ©) 1 | <— multiplier = —8 3 8 e 
1 
! 3 6 0 0 No actual operation is 
Us 0 l _ performed here since 
ee L=|9 2 0 there is already a leading 
0 0 <— multiplier = | 3.8 #1 | in the third row. 


Thus, we have constructed the LU-decomposition 


al 
6 oo! 73 ° 
A=LU=|9 2 Ol]p 1 1 
381 2 
0 6 1 


We leave it for you to confirm this end result by multiplying the factors. 


LU-Decompositions Are Not Unique 


In the absence of restrictions, LU-decompositions are not unique. For example, if 
44, 0 Of} 1 sya wy3 
A=LU=|!2, 122 9 110 1 uae 
{3,3 f52 (33 //0 0 1 


and L has nonzero diagonal entries, then we can shift the diagonal entries from the left factor to the right factor by 
writing 


1 0 Ol’. 9 OF 1 wy wy3 
A = |éapfeny 1 O}] 0 f2 O}}0 1. 293 
Ia fdyy dgafia2 1]]}0 O d3|[O 0 1 


1 0 Ol ft. f1rmia 2112213 
fon feat 1 OO} 0 dog Papteag 
gif li dsa/lo2 1]/ 0 0 I33 


which is another LU-decomposition of A. 


LDU-Decompositions 


The method we have described for computing LU-decompositions may result in an “asymmetry” in that the matrix 

U has I's on the main diagonal but Z need not. However, if it is preferred to have 1's on the main diagonal of the 

lower triangular factor, then we can “shift” the diagonal entries of L to a diagonal matrix D and write L as 
L=i'D 

where J," is a lower triangular matrix with 1's on the main diagonal. For example, a general 3 5 3 lower triangular 

matrix with nonzero entries on the main diagonal can be factored as 


ay, 9 O 1 0 0 ayy 0 0 

az, a@37 0 = |4a71/a11 1 0 0 ay OO 

a3, 432 233 az,/ay, a32/az2 1 0 0 ax 
£ zr D 


Note that the columns of ," are obtained by dividing each entry in the corresponding column of L by the diagonal 
entry in the column. Thus, for example, we can rewrite 4 as 


2 62 2 oof1 31 
-3 -3 0| = |-3 1 0ll0 13 
4 92 4 -3 7\l0 0 1 

3° Oe o ot 31 

= |-2 1 0]/o 1 offo13 

> 3 1/12 9 7flo 0 1 


One can prove that if A is a square matrix that can be reduced to row echelon form without row interchanges, then 
A can be factored uniquely as 


A=iDU 
where L is a lower triangular matrix with 1's on the main diagonal, D is a diagonal matrix, and U is an upper 


triangular matrix with 1's on the main diagonal. This is called the LDU-decomposition (or LDU-factorization) of 
A. 


PLU-Decompositions 


Many computer algorithms for solving linear systems perform row interchanges to reduce roundoff error, in which 


case the existence of an LU-decomposition is not guaranteed. However, it is possible to work around this problem 
by “preprocessing” the coefficient matrix A so that the row interchanges are performed prior to computing the 
LU-decomposition itself. More specifically, the idea is to create a matrix O (called a permutation matrix) by 
multiplying, in sequence, those elementary matrices that produce the row interchanges and then execute them by 
computing the product QA. This product can then be reduced to row echelon form without row interchanges, so it is 
assured to have an LU-decomposition 


QA=LU (13) 


Because the matrix Q is invertible (being a product of elementary matrices), the systems 4x — h and Q.Ax = Qh 
will have the same solutions. But it follows from 13 that the latter system can be rewritten as i0/x = Qh and hence 
can be solved using LU-decomposition. 


It is common to see Equation 13 expressed as 
A=PLU (14) 


in which P = Q —l This is called a PL U-decomposition or (PLU-factorization) of A. 


Concept Review 
e LU-decomposition 
e LDU-decomposition 
e PLU-decomposition 


Skills 

e Determine whether a square matrix has an LU-decomposition. 
e Find an LU-decomposition of a square matrix. 

e Use the method of LU-decomposition to solve linear systems. 
e Find the LDU-decomposition of a square matrix. 


e Find a PLU-decomposition of a square matrix. 


Exercise Set 9.1 


1. Use the method of Example 1 and the LU-decomposition 
3 =—6/ | 3 Oj] 1 —2 
—2 5) |[-2 1][0 1 


3x,—6x3 =0 
=—2x,+5x2 =1 


to solve the system 


Answer: 


xy, =2, x3=1 
2. Use the method of Example | and the LU-decomposition 
3 =-6 =3 3 0 O}}/1 =—2 =1 
2 0 6;/=)} 2 4 0//0 1 2 
—4 7 4 —4 -1 2/;/0 oO 1 
to solve the system 
3x, —6x9=—3x3= =—3 
2x1 + 6x3 = —22 
=—4x,;+7x9+4x3 = 3 


In Exercises 3-10, find an LU-decomposition of the coefficient matrix, and then use the method of Example 1 to 
solve the system. 


[= -lle [2 


Answer: 


— 3 2)|%3 6 
Answer: 
xy=—1, x9=1, x3=0 
6.|—3 12 =—6]/*1 —33 
1 =—2 2)/42)= 7 
0 1 1 || *3 =1 
7, 5 5 10]/*1 0 
—§ =—7 =—9)//4%2/=]1 
O 4 26 || *3 
Answer 
xy= —1, x9=1, x3=0 
8.|-1 =- —4 || x1 —6 
3 10 =—10})/%2)/=] =—3 
—2 —4 11 |} #3 9 
9.| =—1 0 1 O]/} %1 
2 3 =—2 6]/%2]_ | =1 
0 =1 2 O}} *3] | 
Oo 60 1 5)|*4 7 


Answer: 


xy= —3, x9=1, x3=2, x4=1 
10.);2 —4 OO Of] %1 8 

t. 32: 12: . 042 0 

O =—1 —4 =—5|/%3 1 

0 O 2 114|%4 0 


11. Let 
2 1 <1 
A=|=-2 -1 2 
2 ie 0 


(a) Find an LU-decomposition of A. 


(b) Express A in the form A = £;2U/;, where £, is lower triangular with 1's along the main diagonal, £/; is 
upper triangular, and D is a diagonal matrix. 


(c) Express A in the form A = 30/3, where £3 is lower triangular with 1's along the main diagonal and Z/3 is 
upper triangular. 


Answer: 
(a) 20 o]/1 + -1 
A=lU=|-210]], 4 4 
2 Ota g, “4 
(b) 10 0/2 0 oy]1 5 -5 
A=1,DU,;=|-1 1 o|/o 1 0 
1o1ffooiffe eo | 
00 1 
(c) 10 07/2 1 -1 
A=L)U,7=|-1 1 olfo 0 1 
10 1]/0 0 


12 22 
A= 
fa 
13. 3 =—12 6 
A=/0 2 0 
6 =28 13 
Answer 
1 0 0};}3 0 O]] 1 =—4 2 
A=/0 1 O};0 2 OFF 0 1 0 
2 —2 11/0 0 1714/0 0 1 


(a) Show that the matrix 


has no LU-decomposition. 


(b) Find a PLU-decomposition of this matrix. 


In Exercises 15-16, use the given PLU-decomposition of A to solve the linear system 4x = h by rewriting it as 
P —s Ax—?P ly and solving this system by LU-decomposition. 


15. 2 014 
os | Tp AS] 1 8 S|; 
5 3 13 
01 0}/1 0 Of] 1 2 2 
A= }]1 0 0]}//0 1 OO 14 |=PEU 
001]/3 —5 1]/0 0 17 
Answer: 
eth es de 2 de 
ome F ecm b apr aa 
16. 3 rs ae 
6b = /0]; A=/0 2 1): 
6 8 18 
10 0;/1 #0 OF4 1 2 
A= |]0 01//2 1 O}]/0 =-1 4)/=PLU 
Cod, ees Sats 


In Exercises 17-18, find a PLU-decomposition of A, and use it to solve the linear system 4x — h by the method 
of Exercises 15 and 16. 


17. a = =2 
A=|3 -1 1]; b=] 1 
0 
Answer 
1 
10 o1f3 0 off! 73 : 
A=|0 0 1]/0 2 olf, 1 if m=-5. m= 5. 33 
01 0}/3 0 1 2 
0 Oo 1 
18. i 3-—9 4 
A=|1 1 4]/;b=] 5 
i a 3 
19. Let 


(a) Prove: If g + 0, then the matrix A has a unique LU-decomposition with 1's along the main diagonal of L. 
(b) Find the LU-decomposition described in part (a). 


Answer: 
(b) ab 1 O}je@ b 
a a 


20. Let 4x = h be a linear system of 7 equations in n unknowns, and assume that A is an invertible matrix that can 
be reduced to row-echelon form without row interchanges. How many additions and multiplications are 
required to solve the system by the method of Example 1? 


21. Prove: If A is any » x » matrix, then A can be factored as 4 = P7,7/, where L is lower triangular, U is upper 
triangular, and P can be obtained by interchanging the rows of /,, appropriately. [Hint: Let U be a row echelon 
form of A, and let all row interchanges required in the reduction of A to U be performed first. ] 


True-False Exercises 
In parts (a)-(e) determine whether the statement is true or false, and justify your answer. 
(a) Every square matrix has an LU-decomposition. 

Answer: 


False 


(b) If a square matrix A is row equivalent to an upper triangular matrix U, then A has an LU-decomposition. 
Answer: 


False 


(c) If £1, £9, ..., Lj, are » x » lower triangular matrices, then the product £,3+ + - £y is lower triangular. 
Answer: 


True 


(d) If a square matrix A has an LU-decomposition, then A has a unique LDU-decomposition. 
Answer: 


True 


(e) Every square matrix has a PLU-decomposition. 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


9.2 The Power Method 


The eigenvalues of a square matrix can, in theory, be found by solving the characteristic equation. However, this 
procedure has so many computational difficulties that it is almost never used in applications. In this section we will 
discuss an algorithm that can be used to approximate the eigenvalue with greatest absolute value and a corresponding 
eigenvector. This particular eigenvalue and its corresponding eigenvectors are important because they arise naturally in 
many iterative processes. The methods we will study in this section have recently been used to create Internet search 
engines such as Google. We will discuss this application in the next section. 


The Power Method 


There are many applications in which some vector Xg in 2” is multiplied repeatedly by an » sx x, matrix A to produce a 
sequence 
2 k 
xg, <Axg, <Axp,.... AX, -.- 
We call a sequence of this form a power sequence generated by A. In this section we will be concerned with the 


convergence of power sequences and how such sequences can be used to approximate eigenvalues and eigenvectors. 
For this purpose, we make the following definition. 


DEFINITION 1 


If the distinct eigenvalues of a matrix A are Ay, Ag, ..., Ay, and if |Aj| is larger than Ag], ee Ai|, then Aj is 
called a dominant eigenvalue of A. Any eigenvector corresponding to a dominant eigenvalue is called a 


dominant eigenvector of A. 


EXAMPLE 1 Dominant Eigenvalues <« 


Some matrices have dominant eigenvalues and some do not. For example, if the distinct eigenvalues of a 
matrix are 

Ay= —-4, Ag==-2, Azg=1, Aq=3 
then A; = —4 is dominant since |A1| = 4 is greater than the absolute values of all the other eigenvalues; 
but if the distinct eigenvalues of a matrix are 

Ap=7?, Ag=<—7, Ag==—-2, Ag=5 
then |Aj| = |A2| = 7, so there is no eigenvalue whose absolute value is greater than the absolute value of 
all the other eigenvalues. 


The most important theorems about convergence of power sequences apply to » x », matrices with n linearly 
independent eigenvectors (symmetric matrices, for example), so we will limit our discussion to this case in this section. 


THEOREM 9.2.1 


Let A be a symmetric » x »; matrix with a positive dominant eigenvalue . If Xg is a unit vector in 2” that is 
not orthogonal to the eigenspace corresponding to A, then the normalized power sequence 


_ _ Ax | __Ax _ _ Axg 
XO Lag 2 a ee a () 


converges to a unit dominant eigenvector, and the sequence 
Axy +x, <Axg-x9, <Axg-Kx3,..., AX, XK, -.- (2) 


converges to the dominant eigenvalue i. 


Remark In the exercises we will ask you to show that 1 can also be expressed as 


2 


XQ, x = A, x)= A, a (3) 
|| Axo || Axo |A*xqll 


This form of the power sequence expresses each iterate in terms of the starting vector xg, rather than in terms of its 
predecessor. 


We will not prove Theorem 9.2.1, but we can make it plausible geometrically in the 2 5. 2 case where A is a symmetric 
matrix with distinct positive eigenvalues, 4; and Az, one of which is dominant. To be specific, assume that A, is 
dominant and 

Ay > Ag >0 
Since we are assuming that A is symmetric and has distinct eigenvalues, it follows from Theorem 7.2.2 that the 
eigenspaces corresponding to Ay and A3 are perpendicular lines through the origin. Thus, the assumption that xg is a 
unit vector that is not orthogonal to the eigenspace corresponding to Aj implies that Xg does not lie in the eigenspace 
corresponding to Az. To see the geometric effect of multiplying xg by A, it will be useful to split xg into the sum 


xg =Vo+ Wo (4) 


where Vg and Wg are the orthogonal projections of xg on the eigenspaces of A, and Az, respectively (Figure 9.2.1). 


A1V¥9 +A pWo 


Eigenspace A,| Eigenspace A, 


Xo 3 \ 


x 


Eigenspace A,| Eigenspace A, 


be 


(a) (b) (c) 


Figure 9.2.1 


This enables us to express Axg as 


Axg = Avg + Awg = Ayvg + Azwo (5) 


which tells us that multiplying Xp by A “scales” the terms Vg and Wg in 4 by Aj and Ag, respectively. However, Ay is 
larger than Ag, so the scaling is greater in the direction of Vg than in the direction of wg . Thus, multiplying Xp by 4 
“pulls” xg toward the eigenspace of Aj, and normalizing produces a vector xj = Axg / ||_Axq||, which is on the unit 
circle and is closer to the eigenspace of A; than Xg (Figure 9.2.15). Similarly, multiplying x1 by A and normalizing 
produces a unit vector X32 that is closer to the eigenspace of A; than X1. Thus, it seems reasonable that by repeatedly 
multiplying by A and normalizing we will produce a sequence of vectors x; that lie on the unit circle and converge to a 
unit vector x in the eigenspace of Aj (Figure 9.2.1c). Moreover, if X; converges to x, then it also seems reasonable that 
Ax), * Xj, will converge to 

Ax x=Ajx-x=A4|[x|7 =A, 


which is the dominant eigenvalue of A. 


The Power Method with Euclidean Scaling 


Theorem 9.2.1 provides us with an algorithm for approximating the dominant eigenvalue and a corresponding unit 
eigenvector of a symmetric matrix A, provided the dominant eigenvalue is positive. This algorithm, called the power 
method with Euclidean scaling, is as follows: 


The Power Method with Euclidean Scaling 


Step 1. Choose an arbitrary nonzero vector and normalize it, if need be, to obtain a unit vector xq . 


Step 2. Compute Axg and normalize it to obtain the first approximation Xj to a dominant unit eigenvector. 
Compute Ax; - x, to obtain the first approximation to the dominant eigenvalue. 


Step 3. Compute Ax; and normalize it to obtain the second approximation Xz to a dominant unit eigenvector. 
Compute Ax3 + x3 to obtain the second approximation to the dominant eigenvalue. 


Step 4. Compute Ax, and normalize it to obtain the third approximation X3 to a dominant unit eigenvector. 
Compute Ax3 + x3 to obtain the third approximation to the dominant eigenvalue. 


Continuing in this way will usually generate a sequence of better and better approximations to the dominant 


eigenvalue and a corresponding unit eigenvector. 


EXAMPLE 2 The Power Method with Euclidean Scaling << 


Apply the power method with Euclidean scaling to 


fF] =U 


Stop at x5 and compare the resulting approximations to the exact values of the dominant eigenvalue and 
eigenvector. 


Solution We will leave it for you to show that the eigenvalues of A are } — 1 and \ — 4 and that the 
eigenspace corresponding to the dominant eigenvalue \, — 4 is the line represented by the parametric 
equations x; = ¢, x3 = £, which we can write in vector form as 


=") , 


Setting = 1/ y2 yields the normalized dominant eigenvector 


a (2 _, | 9.707106781187... 
1) 1 |] 0707106781187... (7) 
y2 
Now let us see what happens when we use the power method, starting with the unit vector Xp. 
— 3 2)\1 = 3 <i Axg _ _1 3 7 1 3 - 0.83205 
2 3)1|90 2 ||.Axg]| y13 2 3.60555 | 2 0.55470 
rot 3 2] | 0.83205 = 3.60555 — Ax] = 1 3.60555 a 0.73480 
2 3] | 0.55470 3.32820 ||. Ax || 4.90682 | 3.32820 0.67828 
teow 3 2] | 0.73480 * 3.56097 = Ax _ 1 3.56097 ci 0.71274 
2 3] | 0.67828 3.50445 || Ax|| 4.99616 | 3.50445 0.70143 
pea 3 2) | 0.71274 - 3.54108 ve Ax3 = 1 3.54108 ” 0.70824 
2 3] | 0.70143 3.52976 ||. Ax3|| 4.99985 | 3.52976 0.70597 
deems 3 2] | 0.70824 a 3.53666 — Ax ” 1 3.53666 ” 0.70733 
2 3) | 0.70597 3.53440 || Ax4]| 4.99999 | 3.53440 0.70688 
_. a : 0.83205 | 
A [+] xy = (Axy)° x, = [3.60555 3328201] 9 are 4.84615 
@_ a ‘ 0.73480 | 
A [+ x2 = (Ax2)° x2 = [3.56097 3504651] 9 ope 4.99361 
3) _ a an 0.71274] 
A [+3 x3 = (Ax3)° x3 & [3.54108 3529761 9 nag aw 4.99974 
@_ eg ye 0.70824 | 
A [+x x4= (Axg) xq [3.53666 3534401 9 ipso a 4.99999 
© ae ‘a 0.70733 | 
A [+ x5 = (Ax5)° x5 & [3.53576 3539319 cae ez 5.00000 


Thus, © approximates the dominant eigenvalue to five decimal place accuracy and x5 approximates the 
dominant eigenvector in 7 correctly to three decimal place accuracy. 


It is accidental that ,© (the fifth approximation) 
produced five decimal place accuracy. In general, 1 
iterations need not produce n decimal place 
accuracy. 


The Power Method with Maximum Entry Scaling 


There is a variation of the power method in which the iterates, rather than being normalized at each stage, are scaled to 
make the maximum entry 1. To describe this method, it will be convenient to denote the maximum absolute value of the 
entries in a vector x by max(x). Thus, for example, if 


then max(x) = 7. We will need the following variation of Theorem 9.2.1. 


THEOREM 9.2.2 


Let A be a symmetric » sx % matrix with a positive dominant" eigenvalue \ If Xg is anonzero vector in 2” that 
is not orthogonal to the eigenspace corresponding to A, then the sequence 


Axy _ Ax Axi, —1 


MON rman Axg) © 7% max) 77 max (Axe) ®) 


converges to an eigenvector corresponding to A, and the sequence 


Axy + XY Ax} + X2 Ax3 + X3 AX, * Xi 9 
Xp" X, X2°X2 ” x3°X3 °° Xj" Xj 7 0) 


converges to A. 


Remark In the exercises we will ask you to show that 8 can be written in the alternative form 


2 


mo. = thy ee EN... (10) 
max (Ax) max(A xa| max(A xa| 


which expresses the iterates in terms of the initial vector xq . 


We will omit the proof of this theorem, but if we accept that 8 converges to an eigenvector of A, then it is not hard to see 
why 9 converges to the dominant eigenvalue. For this purpose we note that each term in 9 is of the form 


Ax: x 
“xx “ 


which is called a Rayleigh quotient of A. In the case where i is an eigenvalue of A and x is a corresponding eigenvector, 
the Rayleigh quotient is 


Ax-x _ Ax-x _ AMX) _) 
x'X xX°X x°X ° 
Thus, if X; converges to a dominant eigenvector x, then it seems reasonable that 
Ak Xk converges to Axx _ 
Xk Xk x°X 


which is the dominant eigenvalue. 


Theorem 9.2.2 produces the following algorithm, called the power method with maximum entry scaling. 


The Power Method with Maximum Entry Scaling 


Step 1. Choose an arbitrary nonzero vector xg . 

Step 2. Compute Ax, and multiply it by the factor 1 / max(Axg) to obtain the first approximation X] to a 
dominant eigenvector. Compute the Rayleigh quotient of x1 to obtain the first approximation to the 
dominant eigenvalue. 

Step 3. Compute Ax, and scale it by the factor 1 / max(.Ax,)} to obtain the second approximation X2 to a 
dominant eigenvector. Compute the Rayleigh quotient of x2 to obtain the second approximation to the 
dominant eigenvalue. 

Step 4. Compute Ax} and scale it by the factor 1 / max(_Ax2) to obtain the third approximation X3 to a 
dominant eigenvector. Compute the Rayleigh quotient of X3 to obtain the third approximation to the 
dominant eigenvalue. 

Continuing in this way will generate a sequence of better and better approximations to the dominant 
eigenvalue and a corresponding eigenvector. 


John William Strutt Rayleigh (1842-1919) 


Historical Note The British mathematical physicist John Rayleigh won the Nobel prize in physics in 1904 for 
his discovery of the inert gas argon. Rayleigh also made fundamental discoveries in acoustics and optics, and 
his work in wave phenomena enabled him to give the first accurate explanation of why the sky is blue. 

[Image: The Granger Collection, New York] 


EXAMPLE 3 Example 2 Revisited Using Maximum Entry Scaling 
Apply the power method with maximum entry scaling to 


3] =U 


Stop at x5 and compare the resulting approximations to the exact values and to the approximations 
obtained in Example 2. 


Solution We leave it for you to confirm that 


Ax 1/3 1.00000 
Axy = = —__Axg  _ 1/7} 

- E lo] B M1 “max (Ax) 32 | bec 
Ax, wl 2 2] { 1:90000] _ [4.33333 __ Ax 4.33333] __ | 1.00000 
1*l2 31] 0.66667 | ~ | 4.00000 =a eR ECEE 4.00000 | | 0.92308 
ia 3 2)[ 1.00000] _ [4.84615 -_ Ax 4.84615] _ | 1.00000 
2 3] | 0.92308 4.76923 max(.Ax7) eres 4.76923 0.98413 
deen 3 2] | 1.00000] _ [4.96825 oe Ax; 4.96825 1.00000 
2 3|10.98413 4 95238 max(Ax3) 368s 4.95238 0.99681 
Axim | 2 2] | 190000 4.99361 xs 4.99361] __ | 1.00000 
4~)2 311 0.99681 4.99042 3 mnax(Axg) Soar 4.99042 | | 0.99936 


T 
xO — Axx _ (4x1) x1 _ 7.00000 4 94615 


Xj °X] x? x; 1.44444 
\® = 2). = ADD ARE 4.00361 
\® — Sut = ADS we AEE maser 
N= ts = w 358. asm 


— = _ 1.99872 


Thus, \© approximates the dominant eigenvalue correctly to five decimal places and x5 closely 


“{ 


approximates the dominant eigenvector 


that results by taking ¢ — ] in 6. 


Whereas the power method with Euclidean scaling 
produces a sequence that approaches a unit 
dominant eigenvector, maximum entry scaling 
produces a sequence that approaches an eigenvector 
whose largest component is 1. 


Rate of Convergence 


If A is asymmetric matrix whose distinct eigenvalues can be arranged so that 
Pil> Pal Pal 2---2 Pe 
then the “rate” at which the Rayleigh quotients converge to the dominant eigenvalue A; depends on the ratio lA | i |A2 |; 


that is, the convergence is slow when this ratio is near 1 and rapid when it is large—the greater the ratio, the more rapid 
the convergence. For example, if A is a 2 x 2 symmetric matrix, then the greater the ratio |A1| ! |Az |, the greater the 


disparity between the scaling effects of 4; and Az in Figure 9.2.1, and hence the greater the effect that multiplication by 
A has on pulling the iterates toward the eigenspace of Aj. Indeed, the rapid convergence in Example 3 is due to the fact 
that |A,| / |A2| = 5/1=5, which is considered to be a large ratio. In cases where the ratio is close to 1, the 
convergence of the power method may be so slow that other methods must be used. 


Stopping Procedures 


If X is the exact value of the dominant eigenvalue, and if a power method produces the approximation , at the Ath 
iteration, then we call 


A=3” 


v2) 


the relative error in \“), If this is expressed as a percentage, then it is called the percentage error in \“) For 


example, if \ — 5 and the approximation after three iterations is \@ = 5. 1, then 


relative error in A@ — a 


= a = |— 0.02] = 0.02 


percentage error in A = 0.02 x 100% = 2% 


In applications one usually knows the relative error E that can be tolerated in the dominant eigenvalue, so the goal is to 
stop computing iterates once the relative error in the approximation to that eigenvalue is less than E. However, there is a 
problem in computing the relative error from 12 in that the eigenvalue A is unknown. To circumvent this problem, it is 
usual to estimate A by ,“ and stop the computations when 


ee 


~~ In” (13) 


The quantity on the left side of 13 is called the estimated relative error in © and its percentage form is called the 
estimated percentage error in \). 


EXAMPLE 4 Estimated Relative Error 


For the computations in Example 3, find the smallest value of & for which the estimated percentage error 
in \@9 is less than 0.1%. 


Solution The estimated percentage errors in the approximations in Example 3 are as follows: 


APPROXIMATION RELATIVE PERCENTAGE 
ERROR ERROR 


2) _ a 
2. cae - A | ~ 0.02953 = 2.953% 


Re) 4.99361 
\3).| AP -A® |_| 4.99974 — 4.99361 | _ _ , 
d a |* 007d ry 0.00123 = 0.123% 
4 _ \® 
9. ee re ae ~~ 0.00005 = 0.005% 
d 


cael 


—— 5.00000 


| 5.00000 — 4.99999 | os (000001004 
\® 


Thus, \ — 4 99999 is the first approximation whose estimated percentage error is less than 0.1%. 


Remark A rule for deciding when to stop an iterative process is called a stopping procedure. In the exercises, we will 
discuss stopping procedures for the power method that are based on the dominant eigenvector rather than the dominant 
eigenvalue. 


Concept Review 


Power sequence 


Dominant eigenvalue 


Dominant eigenvector 


Power method with Euclidean scaling 


Rayleigh quotient 


Power method with maximum entry scaling 


Relative error 


Percentage error 


Estimated relative error 


Estimated percentage error 


Stopping procedure 
Skills 


e Identify the dominant eigenvalue of a matrix. 
e Use the power methods described in this section to approximate a dominant eigenvector. 


e Find the estimated relative and percentage errors associated with the power methods. 


Exercise Set 9.2 


In Exercises 1—2, the distinct eigenvalues of a matrix are given. Determine whether A has a dominant eigenvalue, and 
if so, find it. 


lea) \y =7, Ag=3, Ag= —8, A= 
(b) Ay = —5, Ag=3, AZ=2, A¥=5 


Answer: 


(a) Az dominant 
(b) No dominant eigenvalue 


2-(a) Ay =1, AQ=0, Ag= —3, AQ—=2 
(b) Ay = —3, Ag= —2, AZ=—1, Ag=3 


In Exercises 3—4, apply the power method with Euclidean scaling to the matrix A, starting with xg and stopping at x4. 
Compare the resulting approximations to the exact values of the dominant eigenvalue and the corresponding unit 
eigenvector. 


Answer: 


cme) 098058]. . [0.98837], | Lf 0.98679], [0.98715], 
ood seta 2 aese0e ly - | need |" sore 


dominant eigenvalue: \ = 2 + y10 = 5.16228; 


| 1 1 
dominant eigenvector: , = ra = lee a 
4. 7 =—2 0 1 
A=|=-2 6 =—2|; xp=/0 
O —2 5 0 


In Exercises 5—6, apply the power method with maximum entry scaling to the matrix A, starting with xg and stopping 
at x4. Compare the resulting approximations to the exact values of the dominant eigenvalue and the corresponding 
scaled eigenvector. 


[3 lt 


Answer: 


x} = ee (O26 = Bel sO S66 acta ae al \® ww 6.60550; 


sie een NO ws 660555: 


dominant eigenvalue: = 3 + y 13% 6.60555; 


3 


re ‘| ¥26+4¥13 | 7 _oarig6 
omuinant eigenvector: o+ i 3 aS 0.28167 
26 + 4y'13 


(a) Use the power method with maximum entry scaling to approximate a dominant eigenvector of A. Start with xg, 
round off all computations to three decimal places, and stop after three iterations. 


(b) Use the result in part (a) and the Rayleigh quotient to approximate the dominant eigenvalue of A. 
(c) Find the exact values of the eigenvector and eigenvalue approximated in parts (a) and (b). 


(d) Find the percentage error in the approximation of the dominant eigenvalue. 


Answer: 


(a) xj = ; x3= : X3 ; 
oe 2agi | a | =0909 


(6) AM =2.8, AM 2.976, AP ws 2.997 
(c) Dominant eigenvalue: }, — 3; dominant eigenvector: ; | 


(d) 0.1% 


8. Repeat the directions of Exercise 7 with 


In Exercises 9-10, a matrix A with a dominant eigenvalue and a sequence Xg, Axg, -.. A’xg are given. Use Formulas 


9 and 10 to approximate the dominant eigenvalue and a corresponding eigenvector. 


See (ees | 4 mas ; ee Be 
oe al Cl eal 


2.99993: beteed 


1.00000 


11. Consider matrices 


where Xg is a unit vector and g + Q. Show that even though the matrix A is symmetric and has a dominant 
eigenvalue, the power sequence | in Theorem 9.2.1 does not converge. This shows that the requirement in that 
theorem that the dominant eigenvalue be positive is essential. 


12. Use the power method with Euclidean scaling to approximate the dominant eigenvalue and a corresponding 
eigenvector of A. Choose your own starting vector, and stop when the estimated percentage error in the eigenvalue 
approximation is less than 0.1%. 


ay f1 3 3 
2. mil 
a a (0 
(b)f1 0 1 1 
rei 
dish, 2 4 
fh, ase 


13. Repeat Exercise 12, but this time stop when all corresponding entries in two successive eigenvector approximations 
differ by less than 0.01 in absolute value. 


Answer: 


(a) 1 
Starting with | 0 |, it takes 8 iterations. 

0 
(b) 


Starting with , It takes 8 iterations. 


oo CO kf 


14. Repeat Exercise 12 using maximum entry scaling. 
15. Prove: If A is a nonzero » x » matrix, then 474 and 4,47 have positive dominant eigenvalues. 
16. (For readers familiar with proof by induction) Let A be an » x » matrix, let Xg be a unit vector in 2”, and define 
the sequence x1, X3, ..., Xj, --. by 
x= yy = og, = et 
|| Axql| || Ax; || || Axj,—1| 


Prove by induction that x, = 4" xp / ||.4"x]J- 


17. (For readers familiar with proof by induction) Let A be an » x » matrix, let xg be a nonzero vector in R”, and 
define the sequence x1, x3, ..., Xj, -.. by 


X| = = ? x2 = = Oo eg Xic = = — ; 
max(Axg) max (Ax) ) max(Axj,_1) 
Prove by induction that 
" Ax 
max (4*xa} 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


9.3 Internet Search Engines 


Early search engines on the Internet worked by examining key words and phrases in pages and titles of posted documents. Today's most popular search engines use algorithms 
based on the power method to analyze hyperlinks (references) between documents. In this section we will discuss one of the ways in which this is done. 


Google, the most widely used engine for searching the Internet, was developed in 1996 by Larry Page and Sergey Brin while both were graduate students at Stanford University. 
Google uses a procedure known as the PageRank algorithm to analyze how documents at relevant sites reference one another. It then assigns to each site a PageRank score, 
stores those scores as a matrix, and uses the components of the dominant eigenvector of that matrix to establish the relative importance of the sites to the search. 


Google starts by using a standard text-based search engine to find an initial set Sp of sites containing relevant pages. Since words can have multiple meanings, the set Sg will 
typically contain irrelevant sites and miss others of relevance. To compensate for this, the set Sp is expanded to a larger set S by adjoining all sites referenced by the pages in the 
sites of Sig. The underlying assumption is that S will contain the most important sites relevant to the search. This process is then repeated a number of times to refine the search 
information still further. 


To be more specific, suppose that the search set S contains n sites, and define the adjacency matrix for S to be the » x , matrix A= [ayy] in which 
aj; = 1if siteireferences site j 
aj; = Of site 7 does not reference site / 


We will assume that no site references itself, so the diagonal entries of A will all be zero. 


EXAMPLE 1 Adjacency Matrices << 


Here is a typical adjacency matrix for a search set with four sites: 
Referenced Site 
1258 A 


Referencing Site qd) 


“eno 
+ O00 
HOO 
oron 
BRWNM ee 


Thus, Site 1 references Sites 3 and 4, Site 2 references Site 1, and so forth. 


There are two basic roles that a site can play in the search process—the site may be a hub, meaning that it references many other sites, or it may be an authority, meaning that it 
is referenced by many other sites. A given site will typically have both hub and authority properties in that it will both reference and be referenced. 


Historical Note The term google is a variation of the word googol, which stands for the number 19100 (1 followed by 100 zeros). This term was invented by the 


American mathematician Edward Kasner (1878-1955) in 1938, and the story goes that it came about when Kasner asked his eight-year-old nephew to give a name to a 
really big number—he responded with “googol.” Kasner then went on to define a googolplex to be 10 geogol (] followed by googol zeros). 


In general, if A is an adjacency matrix for n sites, then the column sums of A measure the authority aspect of the sites and the row sums of A measure their hub aspect. For 
example, the column sums of the matrix in | are 3, 1, 2, and 2, which means that Site 1 is referenced by three other sites, Site 2 is referenced by one other site, and so forth. 
Similarly, the row sums of the matrix in | are 2, 1, 2, and 3, so Site | references two other sites, Site 2 references one other site, and so forth. 

Accordingly, if A is an adjacency matrix, then we call the vector hg of row sums of A the initial hub vector of A, and we call the vector ag of column sums of A the initial 


authority vector of A. Alternatively, we can think of ag as the vector of row sums of 4 T, which turns out to be more convenient for computations. The entries in the hub vector 
are called hub weights and those in the authority vector authority weights. 


EXAMPLE 2 Initial Hub and Authority Vectors of an Adjacency Matrix << 


Find the initial hub and authority vectors for the adjacency matrix A in Example 1. 


Solution The row sums of A yield the initial hub vector 


2 

1 
hy = 

0! |Site 3 2) 
3 


1 
2 |Site 3 (3) 
2 


The link counting in Example 2 suggests that Site 4 is the major hub and Site | is the greatest authority. However, counting links does not tell the whole story; for example, it 
seems reasonable that if Site 1 is to be considered the greatest authority, then more weight should be given to hubs that link to that site, and if Site 4 is to be considered a major 


hub, then more weight should be given to sites to which it links. Thus, there is an interaction between hubs and authorities that needs to be accounted for in the search process. 
Accordingly, once the search engine has calculated the initial authority vector ag, it then uses the information in that vector to create new hub and authority vectors hy and aj 


using the formulas 


Aag 


ie AThy 
\|-Aagl| 


aj => 
47 hy || 


(4) 


The numerators in these formulas do the weighting, and the normalization serves to control the size of the entries. To understand how the numerators accomplish the weighting, 
view the product Aag as a linear combination of the column vectors of A with coefficients from ag. For example, with the adjacency matrix in Example | and the authority vector 


calculated in Example 2 we have 


Referenced Site 


1234 
01 3 0 0 1 1 4 | Site 1 
4a=11 9 0 off1 1 0 0 0} _|3| Site 2 
=3 1 2 2 = 
100 1f2|/~7 1/7 Jol To] 711] -]5| site 3 
111 O}/2 1 1 1 0 6 | Site 4 
Thus, we see that the links to each referenced site are weighted by the authority values in ag To control the size of the entries, the search engine normalizes Aag to produce the 
updated hub vector 
4 0.43133 | Site 1 
Aa 1_|3 0.32350 | Site 2 F 
hy = 740 - = | 7 |e «© New Hub Weight 
1 "T4agll a6 | 5|~ | 0.53916 | site 3 oe 
6 0.64700 | Site 4 


The new hub vector hy can now be used to update the authority vector using Formula 4. The product AThy performs the weighting, and the normalization controls the size: 


Referencing Site 


r2a4 
Athy w [0 1 1 1Yf0.43133 0 1 1 1] [1.50966] Site 1 
0 0 0 1|]0.32350 0 0 0 1|_ | 0.64700 site 2 
we 0.43133| ° | +0.32350| 2 | +0.53916| ° | + 0.64700) } | x 
10 0 1//0.53916 1° o|* o|* 1|™] 1.07833 | Site 3 
10 1 of] 0.64700 1 0 1 0| | 0.97049 | Site 4 
1.50966] [0.68889] Site 1 
T; a 
__ AT _1__-| 0.64700] _ | 0.29524 | Site 2 oe 
a1 aT” 21912 | 1.07833 |™| 0.49207 | sites N°™ Anthony Weights 
0.97049| | 0.44286 | Site 4 


Once the updated hub and authority vectors, hy and a1, are obtained, the search engine repeats the process and computes a succession of hub and authority vectors, thereby 


generating the interrelated sequences 


hy — —Aap_ hj = fay h3 = fan see) hy — fap... 
|| Aag]| ||4ay|| || 4aql| \|Aax—1l (5) 
7 1 ra ih 7 ! | 
Ath Ath Ath Ath 
a 1 a == a aa era age (6) 
Ahi || |.A" hal |A°h3|| A hxll 
However, each of these is a power sequence in disguise. For example, if we substitute the expression for hj, into the expression for aj, then we obtain 
Ti Aay_4 T 
= _ [éatt) __(47A)ax-1 
47 hye y47( (4a I {47 }ax—all 
||4ax—1 1] 
which means that we can rewrite 6 as 
(47.4}ag (4 Ajai (4 P aan 
ag, a= >» a= , = . (7) 
I {474 }aoll {474}aill {47 }ax—all 
Similarly, we can rewrite 5 as 
‘AAT \hy AAT \h; 4 
Aag 
1 Taal” | r - (8) 
0 (447) (447 Vagal 


Remark In Exercise 15 of Section 9.2 you were asked to show that 47 4 and 4.47 both have positive dominant eigenvalues. That being the case, Theorem 9.2.1 ensures that 7 
and 8 converge to the dominant eigenvectors of 47 4 and 44‘, respectively. The entries in those eigenvectors are the authority and hub weights that Google uses to rank the 


search sites in order of importance as hubs and authorities. 


EXAMPLE 3 ARanking Procedure << 


Suppose that a search engine produces 10 Internet sites in its search set and that the adjacency matrix for those sites is 


Referenced Site 
123456789 10 


010010010 0) 1 
000010000 0] 2 
000010000 0} 3 
A= Doe de a) Os Referencing Site 
000000010 0} 5 
01113100101; 6 
0000000001] 7 
000010000 0] 8 
000001000 0] 9 
000001000 0] 10 


Use Formula 7 to rank the sites in decreasing order of authority. 


Solution We will take ag to be the normalized vector of column sums of A, and then we will compute the iterates in 7 until the authority vectors seem to 
stabilize. We leave it for you to show that 


0 0 
2 0.27217 
1 0.13608 
1 0.13608 
a 5| | 0.68041 
(4 |3 0.40825 
1 0.13608 
3 0.40825 
0 0 
2 0.27217 
and that 
oo000000000 0 0 
0211200 20 1]/0.27217 3.26599 
01111001 0 1//0.13608 1.90516 
01111001 0 1//0.13608 1.90516 
Ty, |9 2115 0 0 2 0 1// 0.68041} | 5.30723 
atom 000003 10 0 Of] 0.40825]~ | 1.36083 
00000110 0 0] 0.13608 0.54433 
0211200 3 0 1]/ 040825 3.67423 
coo000000000 0 0 
01111001 0 2//0.27217 2.17732 
Thus, 
0 0 
3.26599 0.40056 
1.90516 0.23366 
ala 1.90516 0.23366 
ee ( Jeo 1__| 5.30723] __ | 0.65090 


i Fig aol "3.15362 | 1.36083 | ~ | 0.16690 
0.54433] | 0.06676 
3.67423] | 0.45063 
0 0 
2.17732] | 0.26704 


Continuing in this way yields the following authority iterates: 


(474 ap (474 ay (474 a (474 a3 (474 ag (4 T a\ao 

= = tr a= ve a3 = T — Pi i T a a re 

I(474}aal I {474}aill I {47A}aall I {47A}aslI I {47A}asll I {47-4}a9II 
0 i) 0 0 0 0 0 Site 1 
0.27217 0.40056 0.41652 0.41918 0.41973 0.41990 0.41990 Site 2 
0.13608 0.23366 0.24917 0.25233 0.25309 0.25337 0.25337 Site 3 
0.13608 0.23366 0.24917 0.25233 0.25309 0.25337 0.25337 Site 4 
0.68041 0.65090 0.63407 0.62836 0.62665 er 0.62597 0.62597 Site 5 
0.40825 0.16690 0.06322 0.02372 0.00889 0.00007 0.00002 Site 6 
0.13608 0.06676 0.02603 0.00981 0.00368 0.00003 0.00001 Site 7 
0.40825 0.45063 0.46672 0.47050 0.47137 0.47165 0.47165 Site 8 
0 0 0 0 0 0 0 Site 9 
0.27217 0.26704 0.27892 0.28300 0.28416 0.28460 0.28460 Site 10 


The small changes between ag and a10 suggest that the iterates have stabilized near a dominant eigenvector of 47 4. From the entries in 1g we conclude that Sites 
1, 6, 7, and 9 are probably irrelevant to the search and that the remaining sites should be searched in order of decreasing importance as 


Site 5, Site 8, Site 2, Site 10, Site 3 and 4(a tie) 


Concept Review 

e Adjacency matrix 

° Hub vector 

e Authority vector 

© Hub weights 

e Authority weights 

Skills 

e Find the initial hub and authority vectors of an adjacency matrix. 


e Use the method of Example 3 to rank sites. 


Exercise Set 9.3 


In Exercises 1—2, find the initial hub and authority vectors for the given adjacency matrix A. 


1. Referenced Site 
123 
001} 1 5 . 
A= 1012 Referencing Site 
10 1] 3 
Answer: 
1 2 
hg =|2], ag=|0 
2 3 
2. Referenced Site 
1234 
0101) 1 
A= |1 00 1] 2 Referencing Site 
1001] 3 
1110] 4 


In Exercises 3—4, find the updated hub and authority vectors h; and aj for the adjacency matrix A. 


3. The matrix in Exercise 1. 


Answer: 
0.39057 0.60971 
hy = | 0.65094 |, ay = 0 
0.65094 0.79262 


4. The matrix in Exercise 2. 


In Exercises 5—8, the adjacency matrix A of an Internet search engine is given. Use the method of Example 3 to rank the sites in decreasing order of authority. 


5. Referenced Site 
1234 
60010] 1 
A= |1 0 0 O| 2 Referencing Site 
110 0] 3 
0100] 4 
Answer: 


Sites 1 and 2 (tie); sites 3 and 4 are irrelevant 


6. Referenced Site 
1234 
0110) 1 
A= |0 0 1 O|} 2 Referencing Site 
1001] 3 
100 0] 4 


7. Referenced Site 


12345 
01110) 1 
4 Ue aes San i 2 Ps . 
A= 00 0-011 3 Referencing Site 
0100 0) 4 
Dol 1) B28 | 
Answer: 


Site 2, site 3, site 4; sites 1 and 5 are irrelevant 
8. Referenced Site 
12345678 9 10 


0110110001; 1 
001000000 0] 2 
0000000001] 3 
0110011001; 4 
A= |0 0010000 0 0] 5 Referencing Site 
010000000 0] 6 
000000001 0] 7 
0000010000] 8 
0110010101; 9 
00000100 0 0] lo 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


9.4 Comparison of Procedures for Solving Linear 
Systems 


There is an old saying that “time is money.” This is especially true in industry where the cost of solving a linear 
system is generally determined by the time it takes for a computer to perform the required computations. This 
typically depends both on the speed of the computer processor and on the number of operations required by the 
algorithm. Thus, choosing the right algorithm has important financial implication in an industrial or research setting. 
In this section we will discuss some of the factors that affect the choice of algorithms for solving large-scale linear 
systems. 


Flops and the Cost of Solving a Linear System 


In computer jargon, an arithmetic operation (+, —, *, -- ) on two real numbers is called a flop, which is an acronym 


for “floating-point operation.” The total number of flops required to solve a problem, which is called the cost of the 
solution, provides a convenient way of choosing between various algorithms for solving the problem. When needed, 
the cost in flops can be converted to units of time or money if the speed of the computer processor and the financial 
aspects of its operation are known. For example, many of today's personal computers are capable of performing in 
excess of 10 gigaflops per second (1 gigaflop — 1°? flops). Thus, an algorithm that costs 1,000,000 flops would be 


executed in 0.0001 seconds. 


To illustrate how costs (in flops) can be computed, let us count the number of flops required to solve a linear system 
of n equations in n unknowns by Gauss—Jordan elimination. For this purpose we will need the following formulas for 
the sum of the first n positive integers and the sum of the squares of the first n positive integers: 


nla 1) 


L4eot¢34.0.4-4%= 5 


(1) 


2_ nla n+ 1) 


2 2 2 
1* + 2° 3% eH 5 


(2) 


Let 4x = h be a linear system of n equations in n unknowns to be solved by Gauss—Jordan elimination (or, 
equivalently, by Gaussian elimination with back substitution). For simplicity, let us assume that A is invertible and 
that no row interchanges are required to reduce the augmented matrix [Alb] to row echelon form. The diagrams that 
accompany the following analysis provide a convenient way of counting the operations required to introduce a 
leading 1| in the first row and then zeros below it. In our operation counts, we will lump divisions and multiplications 
together as “multiplications,” and we will lump additions and subtractions together as “additions.” 


Step 1. It requires n flops (multiplications) to introduce the leading 1 in the first row. 


i se Oe ete Gg x ; A gods 

e . - - 7 ° x denotes aquantity that is being computed. 
ee os bs e denotes a quantity that is not being computed. 
ee e e e|® The augmented matrix size is x x (2 + 1). 


Step 2. It requires n multiplications and n additions to introduce a zero below the leading 1, and there are x — | rows 
below the leading 1, so the number of flops required to introduce zeros below the leading 1 is 2%(% — 1). 


Column 1. 


Column 2. 


Column 3. 


Total for all columns. 


Column n. 


le ee «''* # « 
se 

Ox K "t+ K KI x 
O «x K t'* «x xX 
x 

eae x 

Ox x x 
O «x K '** K Xx 


Combining Steps | and 2, the number of flops required for column 1 is 


n+ an(n— 1) =2n? — 2 


The procedure for column 2 is the same as for column 1, except that now we are 
dealing with one less row and one less column. Thus, the number of flops 
required to introduce the leading 1 in row 2 and the zeros below it can be obtained 
by replacing n by » — | in the flop count for the first column. Thus, the number of 
flops required for column 2 is 


2¢n—1)?— (2-1) 


By the argument for column 2, the number of flops required for column 3 is 


2(n— 2)? — (n—2} 


The pattern should now be clear. The total number of flops required to create the n 
leading I's and the associated zeros is 


(2n?— 2} 4 | 2@= 1)? = (e-1}] 4 [ 2-2)? = ( -2}| 4 (2-1) 


which we can rewrite as 
2[n? 4 (=) +e 1]-|2 (2-1) b...4 1] 


or on applying Formulas 1 and 2 as 


ont l(Qn+1) _ n@+l) _ 2,3, 1,2 
6 2 a" 2 


Next, let us count the number of operations required to complete the backward 
phase (the back substitution). 


It requires » — | multiplications and » — | additions to introduce zeros above the 
leading | in the nth column, so the total number of flops required for the column 


is 2(”% = 1). 


1 0 
x 
0 1 O} x 
0 1 OQ} x 
0.0 0 1 0|* 
e 


Oo 
oOo 
oS 
—s 


Column (m— 1). The procedure is the same as for Step 1, except that now we are dealing with one 
less row. Thus, the number of flops required for the (7 — 1}st column is 2(”% — 2) 


le e@ -++ O00 

x 
O 1 e ++: 0 Oly 
00 1 +++ 0 O/x 
00 0 +++ 10/8 
00 0 -:: 01 


Column (nm — 2). By the argument for column (7 — 1), the number of flops required for column 
(2 — 2) is 2(% — 3). 
Total. The pattern should now be clear. The total number of flops to complete the 
backward phase is 


2(n— 1}4 2(n— 2 | 2(n—3}+...4 2 (2 —n\=2| n? — (1424.4 || 


which we can rewrite using Formula | as 


2x? — ROD | an? —n 


In summary, we have shown that for Gauss—Jordan elimination the number of flops required for the forward and 
backward phases is 


flops for forward phase = 2,3 \ 1,2 = 1, (3) 
3 2 6 
flops for backward phase = nan (4) 


Thus, the total cost of solving a linear system by Gauss—Jordan elimination is 


flops for both phases = val | Bn = tn (5) 


Cost Estimates for Solving Large Linear Systems 


It is a property of polynomials that for large values of the independent variable the term of highest power makes the 
major contribution to the value of the polynomial. Thus, for /arge linear systems we can use 3 and 4 to approximate 
the number of flops in the forward and backward phases as 


flops for forward phase rz on (6) 
flops for backward phase = n (7) 


This shows that it is more costly to execute the forward phase than the backward phase for large linear systems. 


Indeed, the cost difference between the forward and backward phases can be enormous, as the next example shows. 


EXAMPLE 1 Cost of Solving a Large Linear System 


Approximate the time required to execute the forward and backward phases of Gauss—Jordan 
elimination for a system of 10,000 (— 1094) equations in 10,000 unknowns using a computer that can 


execute 10 gigaflops per second. 


Solution We have » — 104 for the given system, so from 6 and 7 the number of gigaflops required 


for the forward and backward phases is 
3 
gigaflops for forward phase ce on x 107 = £ (104) x 107 = $ x 107 
2 
gigaflops for backward phase wn? x 107? = (10°) x 10% = 107! 


Thus, at 10 gigaflops/s the execution times for the forward and backward phases are 


time for forward phase = (3 x 10°) x 107! 5 we 66.67 s 


time for backward phase = (107) x10 s 20.015 


We leave it as an exercise for you to confirm the results in Table 1. 
Table 1 


Approximate Cost for an » x » Matrix A with Large n 


Algorithm Cost in Flops 

Gauss-Jordan elimination (forward phase) ~~ 2,3 

5 
Gauss-Jordan elimination (backward phase) | = ne 
LU-decomposition of A ee 

3 
Forward substitution to solve Zy = b ren? 
Backward substitution to solve ix = y ce ne 


A7! by reducing [AlZ] to E 


At] Pace on? 


Compute 4—!p we nt 


Considerations in Choosing an Algorithm for Solving a Linear System 


For a single linear system 4x — h of 1 equations in n unknowns, the methods of LU-decomposition and Gauss— 
Jordan elimination differ in bookkeeping but otherwise involve the same number of flops. Thus, neither method has 
a cost advantage over the other. However, LU-decomposition has other advantages that make it the method of 
choice: 


e Gauss—Jordan elimination and Gaussian elimination both use the augmented matrix [Alb] , so b must be known. 
In contrast, LU-decomposition uses only the matrix A, so once that decomposition is known it can be used with as 
many right-hand sides as are required, one at a time. 


° The LU-decomposition that is computed to solve 4x — h can be used to compute A —l if needed, with little 


? 


additional work. 


¢ For large linear systems in which computer memory is at a premium, one can dispense with the storage of the 1's 
and zeros that appear on or below the main diagonal of U, since those entries are known from the form of U. The 
space that this opens up can then be used to store the entries of L, thereby reducing the amount of memory 
required to solve the system. 


e If A is a large matrix consisting mostly of zeros, and if the nonzero entries are concentrated in a “band” around the 
main diagonal, then there are techniques that can be used to reduce the cost of LU-decomposition, giving it an 
advantage over Gauss—Jordan elimination. 


The cost in flops for Gaussian elimination is the 
same as that for the forward phase of Gauss— 
Jordan elimination. 


Concept Review 

° Flop 

e Formula for the sum of the first 1 positive integers 

e Formula for the sum of the squares of the first n positive integers 
° Cost in flops for solving large linear systems by various methods 
¢ Cost in flops for inverting a matrix by row reduction 


e Issues to consider when choosing an algorithm to solve a large linear system 


Skills 

e Compute the cost of solving a linear system by Gauss—Jordan elimination. 

e Approximate the time required to execute the forward and backward phases of Gauss—Jordan elimination. 
e Approximate the time required to find an LU-decomposition of a matrix. 


e Approximate the time required to find the inverse of an invertible matrix. 


Exercise Set 9.4 


1. Acertain computer can execute 10 gigaflops per second. Use Formula 5 to find the time required to solve the 
system using Gauss—Jordan elimination. 


(a) Asystem of 1000 equations in 1000 unknowns. 
(b) Asystem of 10,000 equations in 10,000 unknowns. 
(c) Asystem of 100,000 equations in 100,000 unknowns. 


Answer: 


(a) 79.067 second 
(b) = 66.68 seconds 


(c) sv 66, 668 seconds, or about 18.5 hours 
. Acertain computer can execute 100 gigaflops per second. Use Formula 5 to find the time required to solve the 
system using Gauss—Jordan elimination. 
(a) Asystem of 10,000 equations in 10,000 unknowns. 
(b) Asystem of 100,000 equations in 100,000 unknowns. 
(c) Asystem of 1,000,000 equations in 1,000,000 unknowns. 
. Today's personal computers can execute 70 gigaflops per second. Use Table 1 to estimate the time required to 
perform the following operations on the invertible 10,000 < 10,000 matrix A. 
(a) Execute the forward phase of Gauss—Jordan elimination. 
(b) Execute the backward phase of Gauss—Jordan elimination. 


(c) LUJ-decomposition of A. 
(d) Find 4~—! by reducing [AlZ] to E [4 “}. 


Answer: 


(a) 9.52 seconds 
(b) 7 0.0014 second 
(c) *# 9.52 seconds 
(d) F 28.6 seconds 


. The IBM Roadrunner computer can operate at speeds in excess of 1 petaflop per second (1 petaflop = 10” 


flops). Use Table 1 to estimate the time required to perform the following operations of the invertible 
100, 000 x 100, O00 matrix A. 


(a) Execute the forward phase of Gauss—Jordan elimination. 
(b) Execute the backward phase of Gauss—Jordan elimination. 


(c) #¢7-decomposition of A. 
(d) Find 4—! by reducing [AlZ] to E [4 “}. 


*(a) Approximate the time required to execute the forward phase of Gauss—Jordan elimination for a system of 
100,000 equations in 100,000 unknowns using a computer that can execute | gigaflop per second. Do the 
same for the backward phase. (See Table 1.) 


(b) How many gigaflops per second must a computer be able to execute to find the 7, 77-decomposition of a 
matrix of size 10,000 sx 10,000 in less than 0.5 s? (See Table 1.) 


Answer: 


(a) 6.67 x 10° s for forward phase, 10 s for backward phase 
(b) 1334 


6. About how many teraflops per second must a computer be able to execute to find the inverse of a matrix of size 
100, 000 x 100, 000 in less than 0.5 s? (1 teraflop = 10!? flops.) 


In Exercises 7-10, A and B are » x »% matrices and c is a real number. 
7. How many flops are required to compute ¢_4? 
Answer: 


n? flops 
8. How many flops are required to compute 4 4 8? 


9. How many flops are required to compute 48? 
Answer: 


Qn? =n? flops 


10. If A is a diagonal matrix and k is a positive integer, how many flops are required to compute 4"? 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


9.5 Singular Value Decomposition 


In this section we will discuss an extension of the diagonalization theory for > x 3, symmetric matrices to general 
im x» Matrices. The results that we will develop in this section have applications to compression, storage, and 
transmission of digitized information and form the basis for many of the best computational algorithms that are 
currently available for solving linear systems. 


Decompositions of Square Matrices 
We saw in Formula 2 of Section 7.2 that every symmetric matrix A can be expressed as 


A=ppp? (1) 


where P is an » x » orthogonal matrix of eigenvectors of A, and D is the diagonal matrix whose diagonal entries are 
the eigenvalues corresponding to the column vectors of P. In this section we will call 1 an eigenvalue 
decomposition of A (abbreviated EVD of A). 


If an »2 x 2 matrix A is not symmetric, then it does not have an eigenvalue decomposition, but it does have a 
Hessenberg decomposition 


A=PHP? 
in which P is an orthogonal matrix and H is in upper Hessenberg form (Theorem 7.2.4). 


Moreover, if A has real eigenvalues, then it has a Schur decomposition 


A=pPsp? 
in which P is an orthogonal matrix and S is upper triangular (Theorem 7.2.3). 


The eigenvalue, Hessenberg, and Schur decompositions are important in numerical algorithms not only because the 
matrices D, H, and S have simpler forms than A, but also because the orthogonal matrices that appear in these 
factorizations do not magnify roundoff error. To see why this is so, suppose that is a column vector whose entries 
are known exactly and that 
i 
x=xX+e 


is the vector that results when roundoff error is present in the entries of ay 


If P is an orthogonal matrix, then the length-preserving property of orthogonal transformations implies that 
||Px — Px|| = |x —x]| = lle| 

which tells us that the error in approximating Py by Px has the same magnitude as the error in approximating & by 
x: 
There are two main paths that one might follow in looking for other kinds of decompositions of a general square 
matrix A: One might look for decompositions of the form 

A=PJp™ 
in which P is invertible but not necessarily orthogonal, or one might look for decompositions of the form 

A=ury? 


in which U and V are orthogonal but not necessarily the same. The first path leads to decompositions in which J is 
either diagonal or a certain kind of block diagonal matrix, called a Jordan canonical form in honor of the French 
mathematician Camille Jordan (see p. 510). Jordan canonical forms, which we will not consider in this text, are 
important theoretically and in certain applications, but they are of lesser importance numerically because of the 
roundoff difficulties that result from the lack of orthogonality in P. In this section we will focus on the second path. 


Singular Values 


Since matrix products of the form 47 4 will play an important role in our work, we will begin with two basic 
theorems about them. 


THEOREM 9.5.1 


If A is an p92 s¢ 92 Matrix, then: 


(a) Aand 47 4 have the same null space. 

(b) Aand 47 4 have the same row space. 

(c) A? and 4/7 4 have the same column space. 
(d) Aand 47 4 have the same rank. 


We will prove part (a) and leave the remaining proofs for the exercises. 
Proof (a) We must show that every solution of 4x — 0 is a solution of 47 4x —Q, and conversely. If xg is any 
solution of 4x —Q, then xq is also a solution of 47 4x —Q since 

A! Axy = AT (Axq —A™=0 


Conversely, if xg is any solution of 47 4, —0Q, then Xp is in the null space of ,47 4 and hence is orthogonal to all 
vectors in the row space of 47 4 by part (q) of Theorem 4.8.10. 


However, 474 is symmetric, so Xg is also orthogonal to every vector in the column space of ,47 4. In particular, xg 


must be orthogonal to the vector (4 TA pos that is, 
x - (4 7A bo =0 
Using the first formula in Table 1 of Section 3.2 and properties of the transpose operation we can rewrite this as 
xg (A7.4}x9 = (xp) 7 (4x0) = (4x0}- (4x0 } = [|-4x0 ||? =0 


which implies that Axp = 0, thereby proving that Xg is a solution of Axg = 0. 


THEOREM 9.5.2 


If A is an p92 x matrix, then: 
(a) A? Ais orthogonally diagonalizable. 


(b) The eigenvalues of 47 4 are nonnegative. 


Proof (a) The matrix 47 4, being symmetric, is orthogonally diagonalizable by Theorem 7.2.1. 


Proof (b) Since 47 4 is orthogonally diagonalizable, there is an orthonormal basis for 2” consisting of 
eigenvectors of 474, say {¥1, ¥2, -.., ¥y)} - If we let Ay, Ag, ..., Ay be the corresponding eigenvalues, then for 
1 <i<»% we have 
\|.Av; ||? = Av, - Avy = vji- Al Ay; [Formula (26) of Section 3.2 
=y-Aivi=Aj (vi vi] =Ayllvill? =A; 


It follows from this relationship that A; > 0. 


DEFINITION 1 


If A is an jg x 2 matrix, and if Ay, Ag, ..., Ay are the eigenvalues of 47 4, then the numbers 


o1=yX, 02 = yr, --- on= yn 


are called the singular values of A. 


We will assume throughout this section that the 
eigenvalues of 47 4 are named so that 


Ay 2 AQ >... > Ay, > 0 


and hence that 
01 >02>...>0,>0 


EXAMPLE 1 Singular Values <4 


Find the singular values of the matrix 


—- 2 
or 


Solution The first step is to find the eigenvalues of the matrix 


11 
rT, {101 fF 2.4 
Maal olf Li | 


The characteristic polynomial of 47 4 is 
= 4\43= (A- 3}(A— 1) 


so the eigenvalues of 474 are \y = 3 and Az = 1 and the singular values of A in order of decreasing 


i. eee 


size are 


Singular Value Decomposition 


Before turning to the main result in this section, we will find it useful to extend the notion of a “main diagonal” to 
matrices that are not square. We define the main diagonal of an j7; x » matrix to be the line of entries shown in 
Figure 9.5.1—it starts at the upper left corner and extends diagonally as far as it can go. We will refer to the entries 
on the main diagonal as the diagonal entries. 


Main diagonal 


Figure 9.5.1 


We are now ready to consider the main result in this section, which is concerned with a specific way of factoring a 
general 7; x », matrix A. This factorization, called singular value decomposition (abbreviated SVD) will be given in 
two forms, a brief form that captures the main idea, and an expanded form that spells out the details. The proof is 
given at the end of this section. 


THEOREM 9.5.3. Singular Value Decomposition 


If A is an p92 x 9g Matrix, then A can be expressed in the form 

A=uny? 
where U and V are orthogonal matrices and & is an jz x », matrix whose diagonal entries are the singular 
values of A and whose other entries are zero. 


Harry Bateman (1882-1946) 


Historical Note The term singular value is apparently due to the British-born mathematician Harry 
Bateman, who used it in a research paper published in 1908. Bateman emigrated to the United States in 
1910, teaching at Bryn Mawr College, Johns Hopkins University, and finally at the California Institute of 
Technology. Interestingly, he was awarded his Ph.D. in 1913 by Johns Hopkins at which point in time he 
was already an eminent mathematician with 60 publications to his name. 

[/mage: Courtesy of the Archives, California Institute of Technology| 


THEOREM 9.5.4 Singular Value Decomposition (Expanded Form) 


If A is an py x 2 matrix of rank k, then A can be factored as 


vy 
a; 0 “++ 0 vy 
0 02 © 2° 0 lOpsen—K 
A=UEYT=[u wp ++ > upley °° * Umlli an vi 
0 0 

P| (m—iyxen—¥) |] wE 

Oon—i) xk ; 

vn 


in which U, £, and V have sizes #2 = #2, #2 =, and x, respectively, and in which 

(a) ¥ =[¥1 ¥2 --- ¥y] orthogonally diagonalizes 4 T 4. 

(b) The nonzero diagonal entries of © are ¢, = At. o2= yo, 4 T= YAy_, Where Ay, Ag, -... Ag are the 
nonzero eigenvalues of 47 4 corresponding to the column vectors of V. 


(c) The column vectors of V are ordered so that aj > a3 >... >a; >0. 


(Can or ee 
is || Av; || rae i= 1,2,..,% 


(e) {uy, U9, -.., Uj, } is an orthonormal basis for col(A)}. 


() (uy, 02, -.., We, W441, --, Uy} is an extension of {uy, uz, ..., Uz} to an ortho-normal basis for R™. 


The vectors uj, u3, ..., uj, are called the left 
singular vectors of A, and the vectors 

V1, V2, -... Vj, are called the right singular vectors 
of A. 


EXAMPLE 2 Singular Value Decomposition if Als Not Square << 


Find a singular value decomposition of the matrix 


1 1 
A=|0 1 
Lo 


Solution We showed in Example | that the eigenvalues of 474 are \y = 3 and Az = 1 and that the 
corresponding singular values of A are @ = 3 and oz = 1. We leave it for you to verify that 


2 2 


2 2 


d = 
and w3 p 


2 z 


are eigenvectors corresponding to Ay and A, respectively, and that 7 = [v1 |¥2] orthogonally 
diagonalizes 47 4. From part (d) of Theorem 9.5.4, the vectors 


are two of the three column vectors of U. Note that uj and U3 are orthonormal, as expected. We could 
extend the set {uy, u2} to an orthonormal basis for 22. However, the computations will be easier if 


we first remove the messy radicals by multiplying uj and U3 by appropriate scalars. Thus, we will look 
for a unit vector U3 that is orthogonal to 


2 0 
6a =|1]| and y2uz =|-1 
1 1 


To satisfy these two orthogonality conditions, the vector U3 must be a solution of the homogeneous 


linear system 
x1 
b= alle |-b 
0 =—1 1 x3 0 


We leave it for you to show that a general solution of this system is 


x1 —1 
*2)}=é) 1 
x3 1 


Normalizing the vector on the right yields 


v6 i 

3 (3 

oi} - | ve 2 
10 6 2 3 
v6 fo 

6 {3 

bs 


2 


Eugenio Beltrami (1835-1900) 


Gene H. Golub (1932-) 


Historical Note The theory of singular value decompositions can be traced back to the work of five 
people: the Italian mathematician Eugenio Beltrami, the French mathematician Camille Jordan, the English 
mathematician James Sylvester (see p. 34), and the German mathematicians Erhard Schmidt (see p. 360) 
and the mathematician Herman Weyl. More recently, the pioneering efforts of the American mathematician 
Gene Golub produced a stable and efficient algorithm for computing it. Beltrami and Jordan were the 
progenitors of the decomposition—Beltrami gave a proof of the result for real, invertible matrices with 
distinct singular values in 1873. Subsequently, Jordan refined the theory and eliminated the unnecessary 
restrictions imposed by Beltrami. Sylvester, apparently unfamiliar with the work of Beltrami and Jordan, 
rediscovered the result in 1889 and suggested its importance. Schmidt was the first person to show that the 
singular value decomposition could be used to approximate a matrix by another matrix with lower rank, 
and, in so doing, he transformed it from a mathematical curiosity to an important practical tool. Weyl 
showed how to find the lower rank approximations in the presence of error. 

[mages: wikipedia (Beltrami); The Granger Collection, New York (Jordan); Courtesy Electronic Publishing 
Services, Inc., New York City (Weyl; wikipedia (Golub)| 


OPTIONAL 


We conclude this section with an optional proof of Theorem 9.5.4. 


Proof of Theorem 9.5.4 For notational simplicity we will prove this theorem in the case where A is an » x » 


matrix. To modify the argument for an j s 3, matrix you need only make the notational adjustments required to 
account for the possibility that j2; = OF 2 =» j. 


The matrix 47 4 is symmetric, so it has an eigenvalue decomposition 
ATA=VDVT 
in which the column vectors of 
= [v1 |¥2|---|¥n] 
are unit eigenvectors of 474, and D is a diagonal matrix whose successive diagonal entries 4, Az, ..., Ay, are the 
eigenvalues of ,47 4 corresponding in succession to the column vectors of 77 _ Since A is assumed to have rank k, it 


follows from Theorem 9.5.1 that 47 4 also has rank k. It follows as well that D has rank k, since it is similar to 47 4 
and rank is a similarity invariant. Thus, D can be expressed in the form 


At 0 
A2 
D= Xe (2) 
0 
0 0 
where Ay > Aj >... > Ay, > 0. Now let us consider the set of image vectors 
{Av 1, Av, .... Avy} (3) 


This is an orthogonal set, for if? # 7, then the orthogonality of ¥; and Yj implies that 
T 
Av; - Avy =v ;- A° Av; =v; -Ajvy =A; (vi -v3) =0 

Moreover, the first k vectors in 3 are nonzero since we showed in the proof of Theorem 9.5.25 that || 4y,|| 2 A, for 
i= 1, 2,..., x, and we have assumed that the first k diagonal entries in 2 are positive. Thus, 

S= {Av1, Avg, ..., Avi} 
is an orthogonal set of nonzero vectors in the column space of A. But the column space of A has dimension k since 

ranle (4) - rank(A7 A} =k 
and hence S, being a linearly independent set of k vectors, must be an orthogonal basis for col(A). If we now 
normalize the vectors in S, we will obtain an orthonormal basis {uy, u3, ..., uz} for col(4) in which 


Av; 1 ae 
| a ey 1l<ieck 
T4vill Py | -_ | 


or, equivalently, in which 


Av; = yAyu, =o1U1, Av; = y Agu? = 02, ..., Avi, = ¥ Aye = OU; 


It follows from Theorem 6.3.6 that we can extend this to an orthonormal basis 
(uy, U2, --., Ue, Uk +1,---, Uy} 
for 8”. Now let U be the orthogonal matrix 
i= [U1 U2 ... Ue Uy] --- Uy] 


and let © be the diagonal matrix 


oj 0 
a2 
_— oO; 
0 
0 0 
It follows from 4, and the fact that Av; = 0 for; = x, that 
EL = [oyu ouz ... o,uz, 0 33. -Q)] 
= [Av; Avg ... Avy Avg4y ... A¥y] 


AY 
which we can rewrite using the orthogonality of Vas 4 — zyypprT. 


Concept Review 
e Eigenvalue decomposition 
e Hessenberg decomposition 


e Schur decomposition 


Magnification of roundoff error 


Properties that A and 47 4 have in common 
° 47 4is orthogonally diagonalizable 


° Eigenvalues of ,47 4 are nonnegative 


Singular values 
e Diagonal entries of a matrix that is not square 


e Singular value decomposition 


Skills 
e Find the singular values of an jy; x 92 matrix. 


e Find a singular value decomposition of an jz x », matrix. 


Exercise Set 9.5 


(4) 


In Exercises 1-4, find the distinct singular values of 4 . 
1.4=[1 2 0] 


Answer: 


In Exercises 5—12, find a singular value decomposition of A. 


5,,_[1 —1 
alae! 


Answer: 


Answer: 


| 
Wah Wop La [Do 
— &- | 
ae 
S es) 
AIS 


Oo 


13. Prove: If A is an jz x » matrix, then 474 and 4,47 have the same rank. 
14. Prove part (d) of Theorem 9.5.1 by using part (a) of the theorem and the fact that A and 474 have n columns. 
(a) Prove part (b) of Theorem 9.5.1 by first showing that row (4 “A) is a subspace of row(A). 

(b) Prove part (c) of Theorem 9.5.1 by using part (d). 


16. Let 7:2” —, R™ be a linear transformation whose standard matrix A has the singular value decomposition 
A=UEN?, and let B= (vj, v9, ..., Vy} and 3! = fut, U3, ---, Um | be the column vectors of V and Z/, 
respectively. Show that 2 = [7] p' zp. 


17. Show that the singular values of 47 4 are the squares of the singular values of 4 . 


18. Show that if 4 — zyypr7 is a singular value decomposition of A, then U orthogonally diagonalizes AA?. 
True-False Exercises 

In parts (a)-(g) determine whether the statement is true or false, and justify your answer. 

(a) If A is an yy x » matrix, then 47 4 is an yp x yy matrix 


Answer: 


False 


(b) If A is an 7 s¢ » matrix, then 47 4 is a symmetric matrix. 
Answer: 


True 


(c) If A is an p77 5» matrix, then the eigenvalues of 47 4 are positive real numbers. 


Answer: 


False 


(d) If A is an » x » matrix, then A is orthogonally diagonalizable. 
Answer: 


False 


(e) If A is an p47 x » matrix, then 47 4 is orthogonally diagonalizable. 


Answer: 


True 


(f) The eigenvalues of ,47 4 are the singular values of A. 


Answer: 


False 


(g) Every + x #2 matrix has a singular value decomposition. 
Answer: 


True 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


9.6 Data Compression Using Singular Value Decomposition 


Efficient transmission and storage of large quantities of digital data has become a major problem in our technological world. In this section 
we will discuss the role that singular value decomposition plays in compressing digital data so that it can be transmitted more rapidly and 
stored in less space. We assume here that you have read Section 9.5 . 


Reduced Singular Value Decomposition 


Algebraically, the zero rows and columns of the matrix ¥ in Theorem 9.5.4 are superfluous and can be eliminated by multiplying out the 
expression z/yp"? using block multiplication and the partitioning shown in that formula. The products that involve zero blocks as factors 


drop out, leaving 


T 
o, 0 ++: 0 ]/*1 
0 see Q r 
A=[wy u2 Ue] ts |i (1) 
0 60 oO; wh 


which is called a reduced singular value decomposition of A. In this text we will denote the matrices on the right side of | by 07;, £1, and 
yi , Tespectively, and we will write this equation as 


A=UjE,V7 (2) 


Note that the sizes of 171, £1, and pr? are »» x &, , and & se 4, respectively, and that the matrix L, is invertible, since its diagonal 
1,41 1 mx ko kx & kx m, Tesp y: 1 


entries are positive. 
If we multiply out on the right side of 1 using the column-row rule, then we obtain 
A=oiuv, | o7uzv3 -...4 oKUKY, (3) 


which is called a reduced singular value expansion of A. This result applies to a// matrices, whereas the spectral decomposition [Formula 
7 of Section 7.2] applies only to symmetric matrices. 


Remark It can be proved that an j 5 » matrix M has rank 1 if and only if it can be factored as af — yy! where y is a column vector in 
R™ and Vis a column vector in 2”. Thus, a reduced singular value decomposition expresses a matrix A of rank k as a linear combination 
of k rank 1 matrices. 


EXAMPLE 1 Reduced Singular Value Decomposition 


Find a reduced singular value decomposition and a reduced singular value expansion of the matrix 


11 
A=/0 1 
10 


Solution In Example 2 of Section 9.5 we found the singular value decomposition 


ve 
3 (3 
2 ¥2 
01] = gf 1|[(P 1/2 2 
- 6 2 73) |° J |e (4) 
0 Oo) | =4 
6 yo 
6 2 3 
A = uy zr yt 


Since A has rank 2 (verify), it follows from 1 with * = 2 that the reduced singular value decomposition of A corresponding 
to 4 is 


- Oe 
ore 
II 
| 
NM 
| rs | 

Le] 
Oo 
a | 
Nh 
NM 


This yields the reduced singular value expansion 


3 
11 2 
01] = avail on] =¥5] 6 | 2 L] +0 | 2 = | 
10 6 2 2 : 2 2 
6 2 
6 
y3 3 
3 3 0 0 
1 1 
= 3) 2 p eC ~o. 2 
1 1 
v3 y3 22 
6 6 


Note that the matrices in the expansion have rank 1, as expected. 


Data Compression and Image Processing 


Singular value decompositions can be used to “compress” visual information for the purpose of reducing its required storage space and 
speeding up its electronic transmission. The first step in compressing a visual image is to represent it as a numerical matrix from which the 
visual image can be recovered when needed. 


For example, a black and white photograph might be scanned as a rectangular array of pixels (points) and then stored as a matrix A by 
assigning each pixel a numerical value in accordance with its gray level. If 256 different gray levels are used (0 = white to 255 = black), 
then the entries in the matrix would be integers between 0 and 255. The image can be recovered from the matrix A by printing or 
displaying the pixels with their assigned gray levels. 


Original Reconstruction 


Historical Note In 1924 the U.S. Federal Bureau of Investigation (FBI) began collecting fingerprints and handprints and now 
has more than 30 million such prints in its files. To reduce the storage cost, the FBI began working with the Los Alamos National 
Laboratory, the National Bureau of Standards, and other groups in 1993 to devise rank based compression methods for storing 
prints in digital form. The following figure shows an original fingerprint and a reconstruction from digital data that was 
compressed at a ratio of 26:1. 


If the matrix A has size 2 x %, then one might store each of its ;92, entries individually. An alternative procedure is to compute the reduced 
singular value decomposition 


A=oyuvi | ozuzvh oct oKuKY, (5) 
in which 01 > 02 >... >o;%, and store the o's, the y's, and the y's. 


When needed, the matrix A (and hence the image it represents) can be reconstructed from 5. Since each 4 has m entries and each Vj has n 
entries, this method requires storage space for 


kim-bkn + k= km +n + 1) 


numbers. Suppose, however, that the singular values o} 41, ..., 7j, are sufficiently small that dropping the corresponding terms in 5 
produces an acceptable approximation 


A, =oyuyv) | ozuzv, es oyu,ve (6) 


to A and the image that it represents. We call 6 the rank r approximation of A. This matrix requires storage space for only 
rmbrn +r =rle +t 1) 


numbers, compared to j2, numbers required for entry-by-entry storage of A. For example, the rank 100 approximation of a 1000 x 1000 
matrix A requires storage for only 


100(1000 + 1000 + 1) = 200, 100 


numbers, compared to the 1,000,000 numbers required for entry-by-entry storage of A—a compression of almost 80%. 


Figure 9.6.1 shows some approximations of a digitized mandrill image obtained using 6. 


~ 


Rank 4 Rank 10 Rank 20 


ok i 


Rank 50 


Rank 128 


Figure 9.6.1 


Concept Review 
e Reduced singular value decomposition 
e Reduced singular value expansion 


e Rank of an approximation 


Skills 
e Find the reduced singular value decomposition of an j92 5¢ 4, matrix. 


e Find the reduced singular value expansion of an jy 5 »,. 


Exercise Set 9.6 


In Exercises 1—4, find a reduced singular value decomposition of A. [Note: Each matrix appears in Exercise Set 9.5, where you were 
asked to find its (unreduced) singular value decomposition. ] 


1. —2 2 
A=|—1 1 
2 =2 

Answer: 


ok Ok 
oon + + 


hs 
II 
RON 


In Exercises 5—8, find a reduced singular value expansion of A. 
5. The matrix A in Exercise 1. 


Answer: 


342 


war 
sl 
ee | 


+ 


WIN bole LoDo 


6. The matrix A in Exercise 2. 


7. The matrix A in Exercise 3. 


Answer: 


8. The matrix A in Exercise 4. 


9. Suppose A is a 200 x 500 matrix. How many numbers must be stored in the rank 100 approximation of 4? Compare this with the 
number of entries of A. 


Answer: 
70,100 numbers must be stored; A has 100,000 entries 


True-False Exercises 


In parts (a)—(c) determine whether the statement is true or false, and justify your answer. Assume that ULV) is a reduced singular 


value decomposition of an jz; x »; matrix of rank k. 
(a) U7; has size yy x x. 
Answer: 


True 


(b) Ey has size & x x. 
Answer: 


True 


(c) F4 has size & x ». 
Answer: 


False 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


Chapter 9 Supplementary Exercises 


* Find an LU-decomposition of A = 7 al 


Answer: 


2 0]|=3 1 
=—2 1 0 2 
. Find the LDU-decomposition of the matrix A in Exercise 1. 


; 24 6 
Find an LU-decomposition of A=|1 4 7]. 
12.7 


Answer: 
2 0 Ot 2S 
12 0)}/0 1 2 
11 2);0 0 1 
. Find the LDU-decomposition of the matrix A in Exercise 3. 


“Let A= F I and xg = a 


(a) Identify the dominant eigenvalue of A and then find the corresponding dominant unit eigenvector y 
with positive entries. 


(b) Apply the power method with Euclidean scaling to A and Xg, stopping at ¥5. Compare your value of 
x5 to the eigenvector y found in part (a). 


(c) Apply the power method with maximum entry scaling to A and xg, stopping at x5. Compare your 


result with the eigenvector | 


Answer: 


(a) 


(b) .. [0.7100] | [0.7071 
>) 0.7041) © ~| 0.7071 


1 
| spiel 


. Consider the symmetric matrix 


Discuss the behavior of the power sequence 
Xp, Xl,--» Xk, --- 


with Euclidean scaling for a general nonzero vector xg . What is it about the matrix that causes the 
observed behavior? 


7. Suppose that a symmetric matrix A has distinct eigenvalues Ay = 8, Az = 1.4, Ag = 2.3, andAyg= —8.1. 
What can you say about the convergence of the Rayleigh quotients? 


. Find a singular value decomposition of A = a | 


i ae 
9. 1 1 
Find a singular value decomposition of A=|0 0}. 
i | 
Answer: 
etter g, joke 


10. Find a reduced singular value decomposition and a reduced singular value expansion of the matrix A in 
Exercise 9. 


11. Find the reduced singular value decomposition of the matrix whose singular value decomposition is 


a gk 
a a ae Bs it 8 
12 Jk od a Pet 00) ee. “8 
Au|2. 2 2 roa | fe i a || 
ao 27 De Te OO US Ss 
2 2 2 2110 oo|/_1 2 2 
qe geile ook 3 3 3 
> 2 
Answer: 
dd 
2 32 
12 0 6 nee a Sa 4 
4 =-8 10/_ |2 2/[24 ol/3 3. 3 
bee apa ale sede 2 _1 
12 0.6 2 Z 3 3 3 
La 
2 <2 


12. Do orthogonally similar matrices have the same singular values? Justify your answer. 


13. If P is the standard matrix for the orthogonal projection of R” onto a subspace W, what can you say about 
the singular values of P? 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


| CHAPTER eB 


Applications of Linear 
Algebra 


CHAPTER CONTENTS 


10.1. 
10.2. 
10.3. 
10.4. 
10.5. 
10.6. 
10.7. 
10.8. 
10.9. 


10.10. 
10.11. 
10.12. 
10.13. 
10.14. 
10.15. 
10.16. 
10.17. 
10.18. 
10.19. 
10.20. 


Constructing Curves and Surfaces Through Specified Points 
Geometric Linear Programming 

The Earliest Applications of Linear Algebra 
Cubic Spline Interpolation 

Markov Chains 

Graph Theory 

Games of Strategy 

Leontief Economic Models 

Forest Management 

Computer Graphics 

Equilibrium Temperature Distributions 
Computed Tomography 

Fractals 

Chaos 

Cryptography 

Genetics 

Age-Specific Population Growth 
Harvesting of Animal Populations 

A Least Squares Model for Human Hearing 
Warps and Morphs 


INTRODUCTION 


This chapter consists of 20 applications of linear algebra. With one clearly marked 


exception, each application is in its own independent section, so sections can be deleted or 
permuted as desired. Each topic begins with a list of linear algebra prerequisites. 


Because our primary objective in this chapter is to present applications of linear algebra, 


proofs are often omitted. Whenever results from other fields are needed, they are stated 
precisely, with motivation where possible, but usually without proof. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.1 Constructing Curves and Surfaces Through 
Specified Points 


In this section we describe a technique that uses determinants to construct lines, circles, and general conic 
sections through specified points in the plane. The procedure is also used to pass planes and spheres in 3-space 
through fixed points. 


Prerequisites 


Linear Systems 
Determinants 


Analytic Geometry 


The following theorem follows from Theorem 2.3.8. 


THEOREM 10.1.1 


A homogeneous linear system with as many equations as unknowns has a nontrivial solution if and only 
if the determinant of the coefficient matrix is zero. 


We will now show how this result can be used to determine equations of various curves and surfaces through 
specified points. 
A Line Through Two Points 
Suppose that (x1, v1) and (x3, y) are two distinct points in the plane. There exists a unique line 

cyx +eoy +e3=0 (1) 
that passes through these two points (Figure 10.1.1). Note that ¢1, ¢2, and ¢3 are not all zero and that these 
coefficients are unique only up to a multiplicative constant. Because (x1, 1) and (x, 3) lie on the line, 


substituting them in | gives the two equations 


CyxX, #e2yvy +e3=0 (2) 


c1x2 #272 +¢3=0 (3) 


Figure 10.1.1 


The three equations, 1, 2, and 3, can be grouped together and rewritten as 


xey+yeg+ce3 = 0 
xyey tyicga4+e3 = O 
X20, #y7e2+¢3 = DO 


which is a homogeneous linear system of three equations for ¢1, ¢2, and ¢3. Because C1, ¢2, and €3 are not all 
zero, this system has a nontrivial solution, so the determinant of the coefficient matrix of the system must be 
zero. That is, 


a a | 
x1 y1 1]/=0 (4) 
x2 2 1 


Consequently, every point (x, y) on the line satisfies 4; conversely, it can be shown that every point (x,y) that 
satisfies 4 lies on the line. 


EXAMPLE 1 EquationofaLine «@ 
Find the equation of the line that passes through the two points (2, 1) and (3, 7). 


Solution Substituting the coordinates of the two points into Equation 4 gives 


x y il 
21 1/=9 
371 


The cofactor expansion of this determinant along the first row then gives 


—6x++y+11=0 


A Circle Through Three Points 


Suppose that there are three distinct points in the plane, (x1, 4). (x2, yz), and (x3, v3), not all lying ona 
straight line. From analytic geometry we know that there is a unique circle, say, 


p(x? + y?) 4eox tea +04=0 (5) 


that passes through them (Figure 10.1.2). Substituting the coordinates of the three points into this equation gives 


oy (x? +y?) +eax1 beg + c4=0 (6) 
2 Z _ 

C1(X9 + y9) + cax2 +0392 +c4=0 (7) 

01 (x3 + y3) + 09x3 +043 +e4=0 (8) 


As before, Equations 5 through 8 form a homogeneous linear system with a nontrivial solution for ¢1, ¢2, ¢3, 
and ¢4. Thus the determinant of the coefficient matrix is zero: 

x? 4 y? x y il 

2 2 

x4 + Y¥y *1 ¥i1 1 
2,2 =u (9) 
xa+y¥3 x2 ¥2 1 

2 2 

x3+¥3 3 ¥3 1 


This is a determinant form for the equation of the circle. 


Figure 10.1.2 


EXAMPLE 2 EquationofaCircle 
Find the equation of the circle that passes through the three points (1, 7), (6, 2), and (4, 6). 


Solution Substituting the coordinates of the three points into Equation 9 gives 
x? 4 y? xo 4 
50 17 Nao 
40 6 2 1 
52 46 1 
which reduces to 
10(x? + y*) — 20x — 40» — 200 =0 


In standard form this is 


(x—1)? + (y—2)7=5? 


Thus the circle has center (1, 2) and radius 5. 


A General Conic Section Through Five Points 


In his momumental work Principia Mathematica, Issac Newton posed and solved the following problem (Book 
I, Proposition 22, Problem 14): “To describe a conic that shall pass through five given points.” Newton solved 
this problem geometrically, as shown in Figure 10.1.3, in which he passed an ellipse through the points A, B, D, 
P, C; however, the methods of this section can also be applied. 


Figure 10.1.3 


The general equation of a conic section in the plane (a parabola, hyperbola, or ellipse, or degenerate forms of 
these curves) is given by 


eyx? + coxy 4+ cy bogx +e5y +¢c6=0 
This equation contains six coefficients, but we can reduce the number to five if we divide through by any one of 
them that is not zero. Thus only five coefficients must be determined, so five distinct points in the plane are 


sufficient to determine the equation of the conic section (Figure 10.1.4). As before, the equation can be put in 
determinant form (see Exercise 7): 


x? xy y? x y i 
2 2 1 
4, AIM Fy *1 ¥i1 

2 2 

x2 X2¥2 ¥2 x2 y2 1 

=0 (10) 

2 2 

x3 X3¥3 ¥3 x3 3 1 
2 2 

x4 X4¥4 Yq X4 4 1 
2 2 1 


X5 X5SV5S Y5 X*5 V5 


Figure 10.1.4 


EXAMPLE 3 Equation of an Orbit 


An astronomer who wants to determine the orbit of an asteroid about the Sun sets up a Cartesian 
coordinate system in the plane of the orbit with the Sun at the origin. Astronomical units of 
measurement are used along the axes (1 astronomical \ynit = mean distance of Earth to Syn — 93 
million miles). By Kepler's first law, the orbit must be an ellipse, so the astronomer makes five 
observations of the asteroid at five different times and finds five points along the orbit to be 


(8.025, 8.310), (10.170, 6.355), (11.202, 3.212), (10.736,0.375), (9.092, — 2.267) 
Find the equation of the orbit. 


Solution Substituting the coordinates of the five given points into 10 and rounding to three 
decimal places give 


x? xy y? x ¥ 


1 
64.401 66.688 69.056 8025 8.310 1 
103.429 64.630 40.386 10.170 6.355 llzg 
125.485 35.981 10.317 11.202 3.212 1 
115.262 4.026 0.141 10.736 0.375 1 
82.664 =—20.612 5.139 9.092 =—2.267 1 


The cofactor expansion of this determinant along the first row yields 
386.802x7 — 102.895xy + 446.029y7 — 2476.443x — 1427.998y — 17109.375 =0 


Figure 10.1.5 is an accurate diagram of the orbit, together with the five given points. 


(8.025, 8.310) 


. (10.170, 6.355) 
6 

: (11.202, 3.212) 
- (10.736, 0.375) 
0 

2 

4 (9.092, —2.267) 

6 


6-4-2 0 2 4 6 8 1012 14 16 18 20 22 


Figure 10.1.5 


A Plane Through Three Points 


In Exercise 8 we ask you to show the following: The plane in 3-space with equation 
Cyx egy #cezz+c4=—0 
that passes through three noncollinear points (x1,.¥1,21)> (%2, ¥2, 22)> and (x3, ¥3, Z3) is given by the 
determinant equation 
x.y 2 
*1 ¥1 24 
*2 ¥2 22 
*3 ¥3 23 


=0 (11) 


a ee ee 2 


EXAMPLE 4 EquationofaPlane << 


The equation of the plane that passes through the three noncollinear points (1, 1, 0), (2,0, = 1), 
and (2, 9, 2) is 


SS 
I 
So 


which reduces to 
2x—=—y + 3z—1=0 


A Sphere Through Four Points 


In Exercise 9 we ask you to show the following: The sphere in 3-space with equation 
(x? + y? +24) 4 eax teay tegzete5=0 


that passes through four noncoplanar points (x4, y1, 21), (x2, ¥2, 22)» (x3, ¥3, 23) and (x4, y4, z4) IS given 
by the following determinant equation: 


x? + y? +24 x y zi 


2 2 2 

xpey~ +z %1 1 21 «1 

oe ee —0 

x5 +5425 X2 2 22 «1SI= 12 
2 2 2 

2 2 2 

x3+¥3 +23 *3 y3 23 «1 


2 2 2 
xg+yg+74 %4 y4 24 1 


EXAMPLE 5 EquationofaSphere << 


The equation of the sphere that passes through the four points (0, 3, 2), (1, = 1, 1), (2, 1, 0), 
and (5, 1, 3) is 


xetyt4z? x y zi 
13 0 3214 
3 | =t 4 i= 
5 2 101 
35 5 131 


This reduces to 
2 2 2 _ 
xb yo +z —4x —2y —6z24+5=0 
which in standard form is 


(x —2)7 4 (y—1)74 (2-3)? =9 


Exercise Set 10.1 


1. Find the equations of the lines that pass through the following points: 
(a) (1, = 1), (2, 2) 
(by (9, 1), = 1) 


Answer: 
(a) y= 3x=—4 
(b) y= —2x+4+1 


2. Find the equations of the circles that pass through the following points: 
(a) (2, 6), (2, 0), (5, 3) 
(b) (2, —2), (3, 5), (—4, 6) 


Answer: 
(a) x7 + y? —4x — 6y +4 =0 or (x —2)7 + (y= 3)7 =9 
(b) x? + y? + 2x —4y — 20 =0 or (x +1)? + (y— 2)? = 25 


3. Find the equation of the conic section that passes through the points (0, 0), (0, — 1), (2,0), (2, —5), and 
(4, =—1). 


Answer: 


x? 4 2xy + y? — 2x + y = 0 (a parabola) 


4. Find the equations of the planes in 3-space that pass through the following points: 
a) Ui aL at, De = Te) 


(b) (2,3,1), €,=-1,=1), €,2,1) 
Answer: 


(a) x+2y+z=0 
(b) =x + y—2z+1=0 
5. (a) Alter Equation 11 so that it determines the plane that passes through the origin and is parallel to the plane 
that passes through three specified noncollinear points. 


(b) Find the two planes described in part (a) corresponding to the triplets of points in Exercises 4(a) and 4(b). 
Answer: 


(a) |x y z 0 
ge ed aad ee 
x2 ¥2 22 1 


x3 3 23 1 
(b) x+2y +2=0; =x +y—2z2=0 
6. Find the equations of the spheres in 3-space that pass through the following points: 
(a) (154, 3), C= 4, 2, 4),10, 1s 2, = 1) 
(by: (0,1 = 200014, 0, tec 7, 0), 3, 1) 


Answer: 


(a) x2 py? p74 2x dy = Qe = —2Or (x= 1)? + (y—2)7 + (z— 1)? =4 
(b) x74 y? | z? — 2x — 2y =3 or (x — 1)? | (y= 1)? pz? =5 
7. Show that Equation 10 is the equation of the conic section that passes through five given distinct points in the 


plane. 


8. Show that Equation 11 is the equation of the plane in 3-space that passes through three given noncollinear 
points. 


9. Show that Equation 12 is the equation of the sphere in 3-space that passes through four given noncoplanar 
points. 


10. Find a determinant equation for the parabola of the form 
cw ox? #c3x ++c4=0 


that passes through three given noncollinear points in the plane. 


Answer: 


Wx” ox 
wal rt x, 1 =i 
ba) re xz 1 7 
V3 = x3 1 


11. What does Equation 9 become if the three distinct points are collinear? 
Answer: 


The equation of the line through the three collinear points 


12. What does Equation 11 become if the three distinct points are collinear? 


Answer: 


0=0 


13. What does Equation 12 become if the four points are coplanar? 
Answer: 


The equation of the plane through the four coplanar points 
Section 10.1 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic 
proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be 
able to use your technology utility to solve many of the problems in the regular exercise sets. 


T1. The general equation of a quadric surface is given by 
ayx? | any? | ax” b B4xy + a5xz-+agyz + azxX + agy + agz + aig =D 
Given nine points on this surface, it may be possible to determine its equation. 


(a) Show that if the nine points (x,, y,) for? = 1, 2, 3, .... 9 lie on this surface, and if they determine uniquely 
the equation of this surface, then its equation can be written in determinant form as 


ge 6 Js 
%6 Ye 2% *6¥6 626 YOZ6 X6 YE 26 1 


25.2.9 2 
x9 Ye 23 XBV7 XIZ7 YIZ7 XT YZ 27 1 


ae ee 
XZ YQ Zg xXgyvg xXezg yezg xg yg zg 1 


2 
%o Yo 29 *O¥O XOZ0 YorO *O YO 29 
(b) Use the result in part (a) to determine the equation of the quadric surface that passes through the points 


(1, 2, 3), (2, 1,7), (0, 4, 6), (3, — 1, 4), (3, 0, 11), (= 1, 5, 8), (9, —8, 3), (4, 5, 3), and 
(—2, 6, 10). 


T2. 
(a) A hyperplane in the n-dimensional Euclidean space ™ has an equation of the form 
41X] + AQXQ + 43X34 * + HAyXy + Ay41 =9 
where @;,i = 1, 2, 3, ....%+ 1, are constants, not all zero, and *;,i= 1, 2,3, - « «,%, are variables for 
which 
(X14, X9, X3, -- Xn) ER” 
A point 


(X10, %20, *30, --» X90) ER” 
lies on this hyperplane if 
41X10 + 42X20 + A3X30 HF | Hh ayXyQ #Ay+1 =9 
Given that the 1 points (x4;, %2;, X3;, --. Xyv),? = 1, 2, 3, ..., #, lie on this hyperplane and that they 
uniquely determine the equation of the hyperplane, show that the equation of the hyperplane can be written 
in determinant form as 


Ki° RE MS HH Ky 
it 421 *31 *** Xm 1 
x12 %22 %32 °° %n2 1)_y 
R19, X23) R39, 8 Fag 1 
Xin X2n X3n *'* Xm 1 


(b) Determine the equation of the hyperplane in 2? that goes through the following nine points: 


(1, 2,:3,4,:5,.6, 7,89): (2,3.4, 5, 6,7,.8, 9, 1) 
(3,4, 3.6; f, 8,9) Ley (4,356, 789,425.35) 
(5, 6, 7,8, 9, 1, 2,3,4) (6, 7,8, 9, 1,2, 3, 4, 5) 
(7, 8.90 1,2).3.4.5, 6): 48,551, 2,-354, 3,0, 7) 
(9,1, 2, 3,4, 5,6, 7.8) 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.2 Geometric Linear Programming 


In this section we describe a geometric technique for maximizing or minimizing a linear expression in two 
variables subject to a set of linear constraints. 


Prerequisites 


Linear Systems 


Linear Inequalities 


Linear Programming 


The study of linear programming theory has expanded greatly since the pioneering work of George Dantzig in 
the late 1940s. Today, linear programming is applied to a wide variety of problems in industry and science. In 
this section we present a geometric approach to the solution of simple linear programming problems. Let us 
begin with some examples. 


EXAMPLE 1 Maximizing SalesRevenue 


A candy manufacturer has 130 pounds of chocolate-covered cherries and 170 pounds of 
chocolate-covered mints in stock. He decides to sell them in the form of two different mixtures. 
One mixture will contain half cherries and half mints by weight and will sell for $2.00 per 
pound. The other mixture will contain one-third cherries and two-thirds mints by weight and 
will sell for $1.25 per pound. How many pounds of each mixture should the candy 
manufacturer prepare in order to maximize his sales revenue? 


Mathematical Formulation § Let the mixture of half cherries and half mints be called mix A, 
and let X; be the number of pounds of this mixture to be prepared. Let the mixture of one-third 
cherries and two-thirds mints be called mix B, and let X23 be the number of pounds of this 
mixture to be prepared. Since mix A sells for $2.00 per pound and mix B sells for $1.25 per 
pound, the total sales z (in dollars) will be 


z= 2.00x; + 1.25x3 


Since each pound of mix 4 contains 5 pound of cherries and each pound of mix B contains ; 
pound of cherries, the total number of pounds of cherries used in both mixtures is 

i i 

5 x14 3 x2 
Similarly, since each pound of mix A contains > pound of mints and each pound of mix B 


2 


contains 3 pound of mints, the total number of pounds of mints used in both mixtures is 


1, “++ 2x, 


2 3 


Because the manufacturer can use at most 130 pounds of cherries and 170 pounds of mints, we 
must have 


Lge the chancel 
51 + 3X2 Dt) 
al, 7 4 < 
5% | 3%2 - 170 


Furthermore, since 1 and x2 cannot be negative numbers, we must have 

xy=0 and x2>0 
The problem can therefore be formulated mathematically as follows: Find values of x1 and x2 
that maximize 


z= 2.00x; + 1.25x2 


subject to 
1,442, <130 
pts 
a1 ory < 170 
xy =O 
x2 20 


Later in this section we will show how to solve this type of mathematical problem 
geometrically. 


EXAMPLE 2 Maximizing Annual Yield 


A woman has up to $10,000 to invest. Her broker suggests investing in two bonds, A and B. 
Bond A is a rather risky bond with an annual yield of 10%, and bond B is a rather safe bond 
with an annual yield of 7%. After some consideration, she decides to invest at most $6000 in 
bond A, to invest at least $2000 in bond B, and to invest at least as much in bond A as in bond 
B. How should she invest her money in order to maximize her annual yield? 


Mathematical Formulation Let x, be the number of dollars to be invested in bond A, and 
let X2 be the number of dollars to be invested in bond B. Since each dollar invested in bond A 
earns $.10 per year and each dollar invested in bond B earns $.07 per year, the total dollar 
amount z earned each year by both bonds is 

z= .10x; + .O?x3 


The constraints imposed can be formulated mathematically as follows: 


Invest no more than $ 10,000: xy +x2 =< 10,000 
Invest at most $ 6000 in bond <A: x, = 6000 
Invest at least $ 2000 in bond 3: xz > 2000 
Invest at least as much in bond A as in bond 2: X1 Xx? 


We also have the implicit assumption that X1 and x2 are nonnegative: 


xy =O and x2>0 
Thus the complete mathematical formulation of the problem is as follows: Find values of 4 
and x3 that maximize 
z=.10x, + 07x32 
subject to 
xXy+x2 <= 10,000 


x, =6000 

x2 > 2000 
xj,—%x%2 = 

x, = 

x2 >0 


EXAMPLE 3 Minimizing Cost << 


A student desires to design a breakfast of cornflakes and milk that is as economical as possible. 
On the basis of what he eats during his other meals, he decides that his breakfast should supply 


him with at least 9 grams of protein, at least ; the recommended daily allowance (RDA) of 


vitamin D, and at least 1 the RDA of calcium. He finds the following nutrition and cost 


information on the milk and cornflakes containers: 


Milk 
(4cup) (1 ounce) 
mee | seme [2m 


In order not to have his mixture too soggy or too dry, the student decides to limit himself to 
mixtures that contain | to 3 ounces of cornflakes per cup of milk, inclusive. What quantities of 
milk and cornflakes should he use to minimize the cost of his breakfast? 


Mathematical Formulation Let x, be the quantity of milk used (measured in 5-cup units), 


and let X2 be the quantity of cornflakes used (measured in 1-ounce units). Then if z is the cost 
of the breakfast in cents, we may write the following. 


Cost of breakfast: z=7.5x1, +5.0x2 


Atleast 9 grams protein: 4x, + 2x2>9 

1 sarin TD: hog satel gy ek 
Atleast > RDA vitamin D: Qt + 107223 
At least ; RDA calcmm: a > ; 


Atleast 1 ounce cornflakes 
2>1 5 (or x1 — 2x2 <0) 


per cup [rw 5 = cups Jo mil X41 
At most 3 ounces cornflakes 3 
422 = > 
per cup [nwo 5 - cups Jo milk Se 3 bor 3x1 — 2x2 20) 
As before, we also have the implicit assumption that x; > 0 and xz > 0. Thus the complete 
mathematical formulation of the problem is as follows: Find values of x; and x32 that minimize 


z= 75x, +5.0x2 


subject to 

4x; +2x2 39 

ion ee 

grit ig%2 =3 

doe ed 

61 =4 

xy,—2x2 <0 

3x,—2x2 >O 

xy 2O 

xz 30 


Geometric Solution of Linear Programming Problems 


Each of the preceding three examples is a special case of the following problem. 


Problem 


Find values of x; and *3 that either maximize or minimize 
Z=C{X{ + C7Xx2 (1) 


subject to 


411%. + a@yaxg (= )CS)C=) 1 
aux; + agx2 (<)(2)(=) 42 Q) 
amiXt + amt (<)(>)(=) Om 


and 


x1 20, x2>0 (3) 


In each of the m conditions of 2, any one of the symbols =, >, and — may be used. 


The problem above is called the general linear programming problem in two variables. The linear function z 
in | is called the objective function. Equations 2 and 3 are called the constraints; in particular, the equations 
in 3 are called the nonnegativity constraints on the variables x1 and X32. 


We will now show how to solve a linear programming problem in two variables graphically. A pair of values 
(x1, X2) that satisfy all of the constraints is called a feasible solution. The set of all feasible solutions 
determines a subset of the x1%3-plane called the feasible region. Our desire is to find a feasible solution that 
maximizes the objective function. Such a solution is called an optimal solution. 


To examine the feasible region of a linear programming problem, let us note that each constraint of the form 
@y1X1 bh ayQX2 = 2; 
defines a line in the ¥1%3-plane, whereas each constraint of the form 
AyjX, aygx2=h; or ayyxXy +ayQx2 > 4; 
defines a half-plane that includes its boundary line 
@j1X1 b ayQX2 = 2; 
Thus the feasible region is always an intersection of finitely many lines and half-planes. For example, the four 
constraints 


1 i ak < 
aa { 372 = 130 
1 zZ < 
a1 { 372 = 170 
x4 >0 
x2 20 


of Example | define the half-planes illustrated in parts (a), (5), (c), and (d) of Figure 10.2.1. The feasible 
region of this problem is thus the intersection of these four half-planes, which is illustrated in Figure 10.2.1e. 


(a) (b) (c) 


(180, 120) 


(0,0) (260, 0) 


(d) (e) 
Figure 10.2.1 


It can be shown that the feasible region of a linear programming problem has a boundary consisting of a finite 
number of straight line segments. If the feasible region can be enclosed in a sufficiently large circle, it is 
called bounded (Figure 10.2.1e); otherwise, it is called unbounded (see Figure 10.2.5). If the feasible region 
is empty (contains no points), then the constraints are inconsistent and the linear programming problem has no 
solution (see Figure 10.2.6). 


Those boundary points of a feasible region that are intersections of two of the straight line boundary segments 
are called extreme points. (They are also called corner points and vertex points.) For example, in Figure 
10.2.1e, we see that the feasible region of Example 1 has four extreme points: 


(0,0), (0,255), (180,120), (260, 0) (4) 


The importance of the extreme points of a feasible region is shown by the following theorem. 


THEOREM 10.2.1 Maximum and Minimum Values 


If the feasible region of a linear programming problem is nonempty and bounded, then the objective 
function attains both a maximum and a minimum value, and these occur at extreme points of the 
feasible region. If the feasible region is unbounded, then the objective function may or may not attain 
a maximum or minimum value; however, if it attains a maximum or minimum value, it does so at an 
extreme point. 


Figure 10.2.2 suggests the idea behind the proof of this theorem. Since the objective function 

Z=C1X1 +C2X2 
of a linear programming problem is a linear function of x1 and *2, its level curves (the curves along which z 
has constant values) are straight lines. As we move in a direction perpendicular to these level curves, the 
objective function either increases or decreases monotonically. Within a bounded feasible region, the 
maximum and minimum values of z must therefore occur at extreme points, as Figure 10.2.2 indicates. 


z minimized 
—— 


Xs 


z decreasing 


\ 


z increasing 


ee Level curves 


Zz Maximized 


Figure 10.2.2 


In the next few examples we use Theorem 10.2.1 to solve several linear programming problems and illustrate 
the variations in the nature of the solutions that may occur. 


EXAMPLE 4 Example 1 Revisited << 


Figure 10.2.le shows that the feasible region of Example | is bounded. Consequently, from 
Theorem 10.2.1 the objective function 


z= 2.00x; + 1.25x3 


attains both its minimum and maximum values at extreme points. The four extreme points and 
the corresponding values of z are given in the following table. 


Extreme Point Value of 
(x4, X2) z= 2.00x, + 1.25x, 


(0, O) Q 


(0, 255) 318.75 
(180, 120) 510.00 
(260, 0) 520,00 


We see that the largest value of z is 520.00 and the corresponding optimal solution is (260, 0). 
Thus the candy manufacturer attains maximum sales of $520 when he produces 260 pounds of 
mixture A and none of mixture B. 


EXAMPLE 5 Using Theorem 10.2.1 


Find values of x1 and x3 that maximize 


Z=x1 + 3x2 


subject to 
2X, 4+3x2 = 24 
x1-%2 = 7 
x2 <= 6 
> oe 
x27 > O 


Solution In Figure 10.2.3 we have drawn the feasible region of this problem. Since it is 
bounded, the maximum value of z is attained at one of the five extreme points. The values of 
the objective function at the five extreme points are given in the following table. 


(0, 0) (7,0) 


Figure 10.2.3 


Extreme Point Value of 
(x4, X2) Z=xX, + 3x, 


18 


21 


15 


Q) 


From this table, the maximum value of z is 21, which is attained at x; = 3 and x3 = 6. 


EXAMPLE 6 Using Theorem 10.2.1 <@ 


Find values of x1 and x3 that maximize 
z=4x, + 6x2 
subject to 
2x, 4+3x2 = 24 


xy—x2 = 7 
x2 <= 6 
x, > O 
xz > O 


Solution The constraints in this problem are identical to the constraints in Example 5, so the 
feasible region of this problem is also given by Figure 10.2.3. The values of the objective 
function at the extreme points are given in the following table. 


Extreme Point Value of 
(x), X3) z= 4x, + 6x, 


We see that the objective function attains a maximum value of 48 at two adjacent extreme 
points, (3, 6) and (9, 2). This shows that an optimal solution to a linear programming problem 
need not be unique. As we ask you to show in Exercise 10, if the objective function has the 
same value at two adjacent extreme points, it has the same value at all points on the straight line 
boundary segment connecting the two extreme points. Thus, in this example the maximum 


value of z is attained at all points on the straight line segment connecting the extreme points 
(3, 6) and (9, 2). 


EXAMPLE 7 The Feasible Region Isa Line Segment 


Find values of x1 and x3 that minimize 


z= 2x, —X2 
subject to 
2X1, +3x2 = 12 
2x4 — 3x2 > 
x, = 0 
x2 > 0 


Solution In Figure 10.2.4 we have drawn the feasible region of this problem. Because one of 
the constraints is an equality constraint, the feasible region is a straight line segment with two 
extreme points. The values of z at the two extreme points are given in the following table. 


Figure 10.2.4 


Extreme Point Value of 
(x), X2) z=2x,-x, 


(3, 2) é‘ 
(6, 0) ] 


The minimum value of z is thus 4 and is attained at x; = 3 and x3 = 2. 


EXAMPLE 8 Using Theorem 10.2.1 


Find values of x1 and x2 that maximize 


Z= 2x1 + 5x2 
subject to 
2x44+%2 > 8 
—4xj;+%x%2 = 2 
2x4— 3x2 <= O 
ei 2.0 
xz 2 0 


Solution The feasible region of this linear programming problem is illustrated in Figure 
10.2.5. Since it is unbounded, we are not assured by Theorem 10.2.1 that the objective function 
attains a maximum value. In fact, it is easily seen that since the feasible region contains points 
for which both x1 and *3 are arbitrarily large and positive, the objective function 

Z= 2x1 + 5x2 


can be made arbitrarily large and positive. This problem has no optimal solution. Instead, we 
say the problem has an unbounded solution. 


4x, +x, =2 


Figure 10.2.5 


EXAMPLE 9 Using Theorem 10.2.1 


Find values of x1 and x2 that maximize 
z= —5x,+%2 


subject to 
2x;4+%2 > 8 
—4xj;+x%2 = 2 
2x,—3x2 = O 
“1 =. 0 
oo. a 


Solution The above constraints are the same as those in Example 8, so the feasible region of 
this problem is also given by Figure 10.2.5. In Exercise 11 we ask you to show that the 
objective function of this problem attains a maximum within the feasible region. By Theorem 
10.2.1, this maximum must be attained at an extreme point. The values of z at the two extreme 
points of the feasible region are given in the following table. 


Extreme Point Value of 
(x), X2) z=—Sx) + 2X2 


(1, 6) l 
(3, 2) —13 


The maximum value of z is thus | and is attained at the extreme point x; = 1, x2 = 6. 


EXAMPLE 10 Inconsistent Constraints 


Find values of x1 and x3 that minimize 


z= 3x, — 8x3 
subject to 
2xy— x2 = 4 
3xy4+11lx2 < 33 
3x4 + 4x2 > 24 
xy > O 
x2 > O 


Solution As can be seen from Figure 10.2.6, the intersection of the five half-planes defined 
by the five constraints is empty. This linear programming problem has no feasible solutions 
since the constraints are inconsistent. 


3x, + 4x, = 24 


Figure 10.2.6 There are no points common to all five shaded half-planes. 


Exercise Set 10.2 


1. Find values of X1 and x2 that maximize 


z= 3x1 + 2x2 
subject to 
2x1; +3x2 <= 6 
2xj;— x2 > O 
x1 = 2 
xz <= 1 
x; 2 0 
x27 > O 
Answer: 


22 


ki =2,53= £; maximum value of z= S 


2. Find values of Xj and x3 that minimize 


z= 3x; — 5x3 
subject to 
2xy—%x%2 = =2 
4x, =—x2 > 0 
x2 3 
x, = 0 
x2 2 0 
Answer: 


No feasible solutions 
3. Find values of x; and x3 that minimize 


Z= — 3x1 + 2x3 


subject to 
3x,—%x%2 > =5 
=X, +x2 = 1 
2x, +4x2 > 12 
a 0 
x2 2 0 
Answer: 


Unbounded solution 


4. Solve the linear programming problem posed in Example 2. 
Answer: 


Invest $6000 in bond A and $4000 in bond B; the annual yield is $880. 


5. Solve the linear programming problem posed in Example 3. 


Answer: 
Z cup of milk, $2 ounces of corn flakes; minimum cost = 353. = 18.68 


6. In Example 5 the constraint x; — x3 <7 is said to be nonbinding because it can be removed from the 
problem without affecting the solution. Likewise, the constraint x3 < 6 is said to be binding because 
removing it will change the solution. 

(a) Which of the remaining constraints are nonbinding and which are binding? 
(b) For what values of the right-hand side of the nonbinding constraint x; — x3 <7 will this constraint 
become binding? For what values will the resulting feasible set be empty? 


(c) For what values of the right-hand side of the binding constraints x3 < 6 will this constraint become 
nonbinding? For what values will the resulting feasible set be empty? 


10. 


11. 


Answer: 


(a) xj > 0 and xz > 0 are nonbinding; 2x; + 3x2 < 24 is binding 


(b) x1 —x2 =v for yp < — 3 is binding and for »y < — & yields the empty set. 
(c) x3 =v for py < 8 is nonbinding and for » < (0 yields the empty set. 


. A trucking firm ships the containers of two companies, A and B. Each container from company A weighs 


40 pounds and is 2 cubic feet in volume. Each container from company B weighs 50 pounds and is 3 cubic 
feet in volume. The trucking firm charges company A $2.20 for each container shipped and charges 
company B $3.00 for each container shipped. If one of the firm's trucks cannot carry more than 37,000 
pounds and cannot hold more than 2000 cubic feet, how many containers from companies 4 and B should 
a truck carry to maximize the shipping charges? 


Answer: 


550 containers from company A and 300 containers from company B; maximum shipping 


charges = $2110 


. Repeat Exercise 7 if the trucking firm raises its price for shipping a container from company A to $2.50. 


Answer: 


925 containers from company A and no containers from company B; maximum shipping 


charges = $2312.50 


. A manufacturer produces sacks of chicken feed from two ingredients, A and B. Each sack is to contain at 


least 10 ounces of nutrient A¥j, at least 8 ounces of nutrient A¥3, and at least 12 ounces of nutrient A3. 
Each pound of ingredient A contains 2 ounces of nutrient 47;, 2 ounces of nutrient A¥3, and 6 ounces of 
nutrient 473. Each pound of ingredient B contains 5 ounces of nutrient 47;, 3 ounces of nutrient 43, and 4 
ounces of nutrient 4/3. If ingredient A costs 8 cents per pound and ingredient B costs 9 cents per pound, 
how much of each ingredient should the manufacturer use in each sack of feed to minimize his costs? 


Answer: 


0.4 pound of ingredient A and 2.4 pounds of ingredient B; minimum cost = 24.8% 


If the objective function of a linear programming problem has the same value at two adjacent extreme 
points, show that it has the same value at all points on the straight line segment connecting the two 
extreme points. [Hint: If (xj : x4) and (x1, x4) are any two points in the plane, a point (x1, x2) lies on 
the straight line segment connecting them if 

x4 = ix} + (1 —£)x;' 
and 

x2= tx5 - (1 —t)x5! 
where ¢ is a number in the interval [0, 1].] 


Show that the objective function in Example 9 attains a maximum value in the feasible set. [Hint: 
Examine the level curves of the objective function. | 


Section 10.2 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the 
relevant documentation for the particular utility you are using. The goal of these exercises is to provide you 
with a basic proficiency with your technology utility. Once you have mastered the techniques in these 
exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 


T1. Consider the feasible region consisting of 0 < x, 0 < y along with the set of inequalities 


(2k + 1)m _f (2k+ Ve \ ope 
x cos ETE + ¥ sin “an < cos( | 


for k= 0, 1, 2, ....% — 1. Maximize the objective function 
z= 3x+4y 


assuming that (a) x = 1, (b)x = 2, (C)n=3, QM 2x=4,()x=5,0x2=6, (2) 2=7,( 2=8 On=9, 
G) » = 10, and (k) » — 1]. (1) Next, maximize this objective function using the nonlinear feasible region, 


0<x,0<y, and 

xy? <1 
(m) Let the results of parts (a) through (k) begin a sequence of values for Zax. Do these values approach the 
value determined in part (1)? Explain. 


T2. Repeat Exercise T1 using the objective function z= x + y. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.3 The Earliest Applications of Linear Algebra 


Linear systems can be found in the earliest writings of many ancient civilizations. In this section we give 
some examples of the types of problems that they used to solve. 


Prerequisites 


Linear Systems 


The practical problems of early civilizations included the measurement of land, the distribution of goods, the 
tracking of resources such as wheat and cattle, and taxation and inheritance calculations. In many cases, these 
problems led to linear systems of equations since linearity is one of the simplest relationships that can exist 
among variables. In this section we present examples from five diverse ancient cultures illustrating how they 
used and solved systems of linear equations. We restrict ourselves to examples before A.D. 500. These 
examples consequently predate the development of the field of algebra by Islamic/Arab mathematicians, a 
field that ultimately led in the nineteenth century to the branch of mathematics now called linear algebra. 


EXAMPLE 1 Egypt (about 1650B.c.) 


Problem 40 of the Ahmes P 


= 


apyrus 


The Ahmes (or Rhind) Papyrus is the source of most of our information about ancient Egyptian 
mathematics. This 5-meter-long papyrus contains 84 short mathematical problems, together 
with their solutions, and dates from about 1650 B.c. Problem 40 in this papyrus is the following: 


Divide 100 hekats of barley among five men in arithmetic progression so that the sum of 
the two smallest is one-seventh the sum of the three largest. 


Let a be the least amount that any man obtains, and let d be the common difference of the terms 
in the arithmetic progression. Then the other four men receive g + @, a + 2d, a + 3d, and 
a + 4d hekats. The two conditions of the problem require that 


a+ (a+d) + (a+ 2d) + (a + 3d) + (a +4d) 
F(a + 2d) + (a+ 3d) + (@ +44)] = a+(a+a) 


100 


These equations reduce to the following system of two equations in two unknowns: 


5a+102 = 100 


i 
lla=— 2@ = 0 () 


The solution technique described in the papyrus is known as the method of false position or 
false assumption. It begins by assuming some convenient value of a (in our case g = 1), 
substituting that value into the second equation, and obtaining 7 — 11 / 2. Substituting g = ] 
and @ = |] / 2 into the left-hand side of the first equation gives 60, whereas the right-hand side 
is 100. Adjusting the initial guess for a by multiplying it by 100 / 60 leads to the correct value 
a —5/ 3. Substituting g — 5 / 3 into the second equation then gives 7 — 55 / 6, so the 
quantities of barley received by the five men are 10 / 6,65 / 6, 120/6,175/ 6, and 230 / 6 
hekats. This technique of guessing a value of an unknown and later adjusting it has been used 
by many cultures throughout the ages. 


EXAMPLE 2 Babylonia (1900-1600 B.c.) 


The Old Babylonian Empire flourished in Mesopotamia between 1900 and 1600 B.c. Many clay 
tablets containing mathematical tables and problems survive from that period, one of which 
(designated Ca MLA 1950) contains the next problem. The statement of the problem is a bit 
muddled because of the condition of the tablet, but the diagram and the solution on the tablet 
indicate that the problem is as follows: 


na 


a ae 


A trapezoid with an area of 320 square units is cut off from a right triangle by a line 
parallel to one of its sides. The other side has length 50 units, and the height of the 
trapezoid is 20 units. What are the upper and the lower widths of the trapezoid? 


Let x be the lower width of the trapezoid and y its upper width. The area of the trapezoid is its 
height times its average width, so 20 * ; Y 


50 = a The solution on the tablet uses these relations to generate the linear system 


= 320. Using similar triangles, we also have 


Xs by) 16 


2) 
sa-y) = 4 


Adding and subtracting these two equations then gives the solution x = 20 and y = 12. 


EXAMPLE 3 China(AD.263) << 


Chiu Chang Suan Shu in Chinese characters 


The most important treatise in the history of Chinese mathematics is the Chiu Chang Suan Shu, 
or “The Nine Chapters of the Mathematical Art.” This treatise, which is a collection of 246 
problems and their solutions, was assembled in its final form by Liu Hui in A.D. 263. Its 
contents, however, go back to at least the beginning of the Han dynasty in the second century 
B.C. The eighth of its nine chapters, entitled “The Way of Calculating by Arrays,” contains 18 
word problems that lead to linear systems in three to six unknowns. The general solution 
procedure described is almost identical to the Gaussian elimination technique developed in 


Europe in the nineteenth century by Carl Friedrich Gauss. The first problem in the eighth 
chapter is the following: 


There are three classes of corn, of which three bundles of the first class, two of the 
second, and one of the third make 39 measures. Two of the first, three of the second, and 
one of the third make 34 measures. And one of the first, two of the second, and three of 
the third make 26 measures. How many measures of grain are contained in one bundle 
of each class? 


Let x, y, and z be the measures of the first, second, and third classes of corn. Then the 
conditions of the problem lead to the following linear system of three equations in three 
unknowns: 


3x+2y+z = 39 
2x+3y+z = 34 (3) 
x+2y+3z = 26 


The solution described in the treatise represented the coefficients of each equation by an 
appropriate number of rods placed within squares on a counting table. Positive coefficients 
were represented by black rods, negative coefficients were represented by red rods, and the 
squares corresponding to zero coefficients were left empty. The counting table was laid out as 
follows so that the coefficients of each equation appear in columns with the first equation in the 
rightmost column: 


Next, the numbers of rods within the squares were adjusted to accomplish the following two 
steps: (1) two times the numbers of the third column were subtracted from three times the 
numbers in the second column and (2) the numbers in the third column were subtracted from 
three times the numbers in the first column. The result was the following array: 


In this array, four times the numbers in the second column were subtracted from five times the 
numbers in the first column, yielding 


This last array is equivalent to the linear system 


3x+2y+z = 39 
Sy+z = 24 
36z = 99 


This triangular system was solved by a method equivalent to back substitution to obtain 
x= 37/4, y= 17/4, andz=11 / 4. 


EXAMPLE 4 Greece (third century B.c.) @ 


Archimedes c. 287-212 B.C. 


Perhaps the most famous system of linear equations from antiquity is the one associated with 
the first part of Archimedes' celebrated Cattle Problem. This problem supposedly was posed by 
Archimedes as a challenge to his colleague Eratosthenes. No solution has come down to us 
from ancient times, so that it is not known how, or even whether, either of these two geometers 
solved it. 


If thou art diligent and wise, O stranger, compute the number of cattle of the Sun, who 
once upon a time grazed on the fields of the Thrinacian isle of Sicily, divided into four 
herds of different colors, one milk white, another glossy black, a third yellow, and the 
last dappled. In each herd were bulls, mighty in number according to these proportions: 
Understand, stranger, that the white bulls were equal to a half and a third of the black 
together with the whole of the yellow, while the black were equal to the fourth part of 
the dappled and a fifth, together with, once more, the whole of the yellow. Observe 
further that the remaining bulls, the dappled, were equal to a sixth part of the white and 
a seventh, together with all of the yellow. These were the proportions of the cows: The 
white were precisely equal to the third part and a fourth of the whole herd of the black; 
while the black were equal to the fourth part once more of the dappled and with it a 


fifth part, when all, including the bulls, went to pasture together. Now the dappled in 
four parts were equal in number to a fifth part and a sixth of the yellow herd. Finally 
the yellow were in number equal to a sixth part and a seventh of the white herd. If thou 
canst accurately tell, O stranger, the number of cattle of the Sun, giving separately the 
number of well-fed bulls and again the number of females according to each color, thou 
wouldst not be called unskilled or ignorant of numbers, but not yet shalt thou be 
numbered among the wise. 


The conventional designation of the eight variables in this problem is 


Rw et Be 


= number of white bulls 

= number of black bulls 

= number of yellow bulls 
= number of dappled bulls 
number of white cows 
= number of black cows 
= number of yellow cows 


= number of dappled cows 


The problem can now be stated as the following seven homogeneous equations in eight 


unknowns: 


» oa (bed)res 

+ ea (bedjavs 
5 pa(Lil}o+a) 
6. a=(54 gery) 
1 y= (La 2)orew) 


(The white bulls were equal to a half and a third of the 
black [bulls] together with the whole of the yellow 
[bulls].) 


(The black [bulls] were equal to the fourth part of the 
dappled [bulls] and a fifth, together with, once more, the 
whole of the yellow [bulls].) 


(The remaining bulls, the dappled, were equal to a sixth 
part of the white [bulls] and a seventh, together with all 
of the yellow [bulls].) 


(The white [cows] were precisely equal to the third part 
and a fourth of the whole herd of the black.) 


(The black [cows] were equal to the fourth part once 
more of the dappled and with it a fifth part, when all, 
including the bulls, went to pasture together.) 


(The dappled [cows] in four parts [that is, in totality] 
were equal in number to a fifth part and a sixth of the 
yellow herd.) 


(The yellow [cows] were in number equal to a sixth part 
and a seventh of the white herd.) 


As we ask you to show in the exercises, this system has infinitely many solutions of the form 


= 10, 366, 482k 
= 7,460,514« 
= 4, 149, 387k 
7, 358, 060k 
= 7, 206, 360k 
= 4,893, 246k 
= 45,439, 213k 
= 3,515, 820k 


where k is any real number. The values & = 1, 2, ... give infinitely many positive integer 
solutions to the problem, with £ = | giving the smallest solution. 


(4) 


aw & =F ow Ww S 
II 


EXAMPLE 5 India (fourth century A.D.) 


Fragment ITI-5-3v of the Bakhshali Manuscript 


The Bakhshali Manuscript is an ancient work of Indian/Hindu mathematics dating from around 
the fourth century A.D., although some of its materials undoubtedly come from many centuries 
before. It consists of about 70 leaves or sheets of birch bark containing mathematical problems 
and their solutions. Many of its problems are so-called equalization problems that lead to 
systems of linear equations. One such problem on the fragment shown is the following: 


One merchant has seven asava horses, a second has nine haya horses, and a third has 
ten camels. They are equally well off in the value of their animals if each gives two 
animals, one to each of the others. Find the price of each animal and the total value of 
the animals possessed by each merchant. 


Let x be the price of an asava horse, let y be the price of a haya horse, let z be the price of a 
camel, and the let K be the total value of the animals possessed by each merchant. Then the 
conditions of the problem lead to the following system of equations: 


AXFYtZ = K 
xt7y4+z = K (5) 
x+yt+8& = K 


The method of solution described in the manuscript begins by subtracting the quantity 


(x ++ y +z) from both sides of the three equations to obtain 4x = 6y = 7z= K — (x + y +2) 
. This shows that if the prices x, y, and z are to be integers, then the quantity K — (x 4 y 4+-z) 
must be an integer that is divisible by 4, 6, and 7. The manuscript takes the product of these 
three numbers, or 168, for the value of K — (x + y 4+-z), which yields x = 42, y = 28, and 

z — 24 for the prices and ¥ — 362 for the total value. (See Exercise 6 for more solutions to this 
problem.) 


Exercise Set 10.3 


1. The following lines from Book 12 of Homer's Odyssey relate a precursor of Archimedes' Cattle Problem: 


Thou shalt ascend the isle triangular, 
Where many oxen of the Sun are fed, 
And fatted flocks. Of oxen fifty head 
In every herd feed, and their herds are seven; 


And of his fat flocks is their number even. 


The last line means that there are as many sheep in all the flocks as there are oxen in all the herds. What is 
the total number of oxen and sheep that belong to the god of the Sun? (This was a difficult problem in 
Homer's day.) 


Answer: 


700 
2. Solve the following problems from the Bakhshali Manuscript. 


(a) B possesses two times as much as A; C has three times as much as A and B together; D has four times 
as much as A, B, and C together. Their total possessions are 300. What is the possession of A? 


(b) B gives 2 times as much as A; C gives 3 times as much as B; D gives 4 times as much as C. Their total 
gift is 132. What is the gift of A? 


Answer: 


(a) 5 
(b) 4 
3. A problem on a Babylonian tablet requires finding the length and width of a rectangle given that the length 


and the width add up to 10, while the length and one-fourth of the width add up to 7. The solution 
provided on the tablet consists of the following four statements: 


Multiply 7 by 4 to obtain 28. 
Take away 10 from 28 to obtain 18. 
Take one-third of 18 to obtain 6, the length. 


Take away 6 from 10 to obtain 4, the width. 


Explain how these steps lead to the answer. 


. The following two problems are from “The Nine Chapters of the Mathematical Art.” Solve them using the 
array technique described in Example 3. 


(a) Five oxen and two sheep are worth 10 units and two oxen and five sheep are worth 8 units. What is the 
value of each ox and sheep? 


(b) There are three kinds of corn. The grains contained in two, three, and four bundles, respectively, of 
these three classes of corn, are not sufficient to make a whole measure. However, if we added to them 
one bundle of the second, third, and first classes, respectively, then the grains would become on full 
measure in each case. How many measures of grain does each bundle of the different classes contain? 


Answer: 


(a) Ox, = units; sheep, 2 unit 


(b) First kind, measure; second kind, & measure; third kind, a measure 


. This problem in part (a) is known as the “Flower of Thymaridas,” named after a Pythagorean of the fourth 
century B.C. 


(a) Given the n numbers 41, @3, -.., @y, solve for x1, X2, ..., X, in the following linear system: 


XpRA2F* tt HX, = A 
Xi#*%x2 = «43 
Xp,rxZ = &3 
XL, %y = Ay 


(b) Identify a problem in this exercise set that fits the pattern in part (a), and solve it using your general 
solution. 


Answer: 


(a) x,= xy =A; =—X1,i= 2, cia ” 
1 1 1 


(b) Exercise 7(b); gold, 305 minae; brass, 2 minae; tin, 14— minae; iron, 5— minae 


. For Example 5 from the Bakhshali Manuscript: 


(a) Express Equations 5 as a homogeneous linear system of three equations in four unknowns (x, y, z, and 
K) and show that the solution set has one arbitrary parameter. 


(b) Find the smallest solution for which all four variables are positive integers. 


(c) Show that the solution given in Example 5 is included among your solutions. 


Answer: 
(a) 2x*+y+z—-K = 0 
x+7?y+z—-K = 0 
x+y+8z=-K 0 
x= Slt =e — 12t x <4 where ris an arbitrary number 


i it 131 
(b) Take ¢ = 131, so that x = 21, y = 14,2=12, K= 131. 
(c) Take ¢ = 262, so that x = 42, y = 28,2 = 24, K = 262. 


7. Solve the problems posed in the following three epigrams, which appear in a collection entitled “The 
Greek Anthology,” compiled in part by a scholar named Metrodorus around A.D. 500. Some of its 46 
mathematical problems are believed to date as far back as 600 B.c. [Note: Before solving parts (a) and (c), 
you will have to formulate the question. ] 


(a) I desire my two sons to receive the thousand staters of which I am possessed, but let the fifth part of 
the legitimate one's share exceed by ten the fourth part of what falls to the illegitimate one. 


(b) Make me a crown weighing sixty minae, mixing gold and brass, and with them tin and much-wrought 
iron. Let the gold and brass together form two-thirds, the gold and tin together three-fourths, and the 
gold and iron three-fifths. Tell me how much gold you must put in, how much brass, how much tin, 
and how much iron, so as to make the whole crown weigh sixty minae. 


(c) First person: I have what the second has and the third of what the third has. Second person: I have 
what the third has and the third of what the first has. Third person: And I have ten minae and the third 
of what the second has. 


Answer: 


(a) Legitimate son, S774 staters; illegitimate son, 4202 staters 


(b) Gold, 305 minae; brass, 95 minae; tin, 145 minae; iron, 55 minae 


(c) First person, 45; second person, 3735 third person, 225 


Section 10.3 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the 
relevant documentation for the particular utility you are using. The goal of these exercises is to provide you 
with a basic proficiency with your technology utility. Once you have mastered the techniques in these 
exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 


TE 

(a) Solve Archimedes' Cattle Problem using a symbolic algebra program. 

(b) The Cattle Problem has a second part in which two additional conditions are imposed. The first of these 
states that “When the white bulls mingled their number with the black, they stood firm, equal in depth and 
breadth.” This requires that }#” +. # be a square number, that is, 1, 4, 9, 16, 25, and so on. Show that this 
requires that the values of k in Eq. 4 be restricted as follows: 

k=4,456,749r*, r=1, 2, 3,... 


and find the smallest total number of cattle that satisfies this second condition. 


Remark The second condition imposed in the second part of the Cattle Problem states that “When the 
yellow and the dappled bulls were gathered into one herd, they stood in such a manner that their number, 
beginning from one, grew slowly greater ’til it completed a triangular figure.” This requires that the quantity 
¥+ D be a triangular number—that is, a number of the form ], 1 + 2,1 + 2+3,1+2+3+44,.... This 
final part of the problem was not completely solved until 1965 when all 206,545 digits of the smallest 
number of cattle that satisfies this condition were found using a computer. 


T2. The following problem is from “The Nine Chapters of the Mathematical Art” and determines a 
homogeneous linear system of five equations in six unknowns. Show that the system has infinitely many 
solutions, and find the one for which the depth of the well and the lengths of the five ropes are the smallest 
possible positive integers. 


Suppose that five families share a well. Suppose further that 

2 of A's ropes are short of the well's depth by one of B's ropes. 
3 of B's ropes are short of the well's depth by one of C's ropes. 
4 of C's ropes are short of the well's depth by one of D's ropes. 
5 of D's ropes are short of the well's depth by one of E's ropes. 
6 of E's ropes are short of the well's depth by one of A's ropes. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.4 Cubic Spline Interpolation 


In this section an artist's drafting aid is used as a physical model for the mathematical problem of finding a curve that passes 
through specified points in the plane. The parameters of the curve are determined by solving a linear system of equations. 


Prerequisites 


Linear Systems 
Matrix Algebra 


Differential Calculus 


Curve Fitting 


Fitting a curve through specified points in the plane is a common problem encountered in analyzing experimental data, in 
ascertaining the relations among variables, and in design work. A ubiquitous application is in the design and description of 
computer and printer fonts, such as PostScript™ and TrueType™ fonts (Figure 10.4.1). In Figure 10.4.2 seven points in the 
xy-plane are displayed, and in Figure 10.4.4 a smooth curve has been drawn that passes through them. A curve that passes 
through a set of points in the plane is said to interpolate those points, and the curve is called an interpolating curve for those 
points. The interpolating curve in Figure 10.4.4 was drawn with the aid of a drafting spline (Figure 10.4.3). This drafting aid 
consists of a thin, flexible strip of wood or other material that is bent to pass through the points to be interpolated. Attached 
sliding weights hold the spline in position while the artist draws the interpolating curve. The drafting spline will serve as the 
physical model for a mathematical theory of interpolation that we will discuss in this section. 


Figure 10.4.1 


Figure 10.4.2 


Figure 10.4.3 


Figure 10.4.4 


Statement of the Problem 


Suppose that we are given 7 points in the xy-plane, 
(%1,.¥1), (42; ¥2),--+ Ans Yn) 
which we wish to interpolate with a “well-behaved” curve (Figure 10.4.5). For convenience, we take the points to be equally 
spaced in the x-direction, although our results can easily be extended to the case of unequally spaced points. If we let the 
common distance between the x-coordinates of the points be h, then we have 
XQeXp SHARK AQH tt SH Xy—K— Xy-p = 
Let y = S(x),%1 5% = %y, denote the interpolating curve that we seek. We assume that this curve describes the displacement of 


a drafting spline that interpolates the n points when the weights holding down the spline are situated precisely at the n points. It 
is known from linear beam theory that for small displacements, the fourth derivative of the displacement of a beam is zero along 
any interval of the x-axis that contains no external forces acting on the beam. If we treat our drafting spline as a thin beam and 
realize that the only external forces acting on it arise from the weights at the n specified points, then it follows that 


scx) =0 (1) 


for values of x lying in the » — | open intervals 
(x1, x2), (x2, x3), <9 (Xy-1, Xp) 


between the n points. 


Figure 10.4.5 


We also need the result from linear beam theory that states that for a beam acted upon only by external forces, the displacement 
must have two continuous derivatives. In the case of the interpolating curve y = §(x)} constructed by the drafting spline, this 
means that S(x), S" (x), and S“(x) must be continuous for X{ SX Xy. 


The condition that S“" (x) be continuous is what causes a drafting spline to produce a pleasing curve, as it results in continuous 


curvature. The eye can perceive sudden changes in curvature—that is, discontinuities in S'” (x)—but sudden changes in higher 
derivatives are not discernible. Thus, the condition that 5" (x) be continuous is the minimal prerequisite for the interpolating 
curve to be perceptible as a single smooth curve, rather than as a series of separate curves pieced together. 


To determine the mathematical form of the function S(x), we observe that because sf”) ) = 0 in the intervals between the n 
specified points, it follows by integrating this equation four times that S{x} must be a cubic polynomial in x in each such 
interval. In general, however, S{x)} will be a different cubic polynomial in each interval, so S(x} must have the form 


51), Xp, =x=xQ 
S(x) = S22), X22X<x3 » 
Sy-10%), Xn-1 Sx = Xy 
where 5} (x), 52(x), -... S,—1 (x) are cubic polynomials. For convenience, we will write these in the form 
Sie) = ay(x—x1)? +8, — 21)? +e, (x — 21) +41, pears 
So(x) = an(x — x2)? +.b9(x — x9)? + €9(x — 22) +2, x22x<x3 (3) 
Sr-i0@) = yi (X —Xy—1)2 + Op 1X — an) bene — Fn) Fen. Xm-1 STS Ip 


The 2;'s, ,'s, ¢;'s, and @,'s constitute a total of 4» — 4 coefficients that we must determine to specify S(x} completely. If we 
choose these coefficients so that Sr) interpolates the n specified points in the plane and S(x), S" (x), and S"’ (x) are 


continuous, then the resulting interpolating curve is called a cubic spline. 


Derivation of the Formula of a Cubic Spline 


From Equations 2 and 3, we have 


Sx) = Sy(x) =ay(x — x1)? +81 (x — x1)? +e (x — x1) +1, xy <x<x9 
Sx) = Sy(x) =a2(x — x2)? + b2(x — 22)? +ea(x — 22) +2, x2 x23 4) 
Sx) = Syn) Saya x = Xp1)? + bp KX p—1)? + en An) Han, Kn-1 SES Kp 
SO 
Sq) = Si) =3ay(e— 71)? + 21x21) $e, xy <x<x2 
S'(x) = Sh (x) = 3aq(x — x2)? + 2ba(x — x2) +02, X2=X=X3 (5) 
Se) = Sy) = Bape xp)? + bp — an) Fen, MMA SES In 
and 
S"(x) = Si (x) =6ay(x — x1) + 281, xy <x<xQ 
S"(x) = Si! (x) = 6a2(x — x2) + 2b, X2=X 2x3 (6) 
Sx) = SM (x) = ban —Xn-1) + 2bn, An-1 SX Sn 


We will now use these equations and the four properties of cubic splines stated below to express the unknown coefficients @;, ; 
,C;,d@;,i=1, 2,..., — 1, in terms of the known coordinates V1. ¥2e-- Yn 


1. S(x) interpolates the points (x;, y;),i = 1, 2, .... 2. 


Because (x) interpolates the points (x;, y;),i= 1, 2, ..., ”, we have 


S(x1) =y1, S(x2) = y2, --. Sn) =n (7) 
From the first », — | of these equations and 4, we obtain 
qd, = 1 
a2 = y2 (8) 
dy. = Yn-1 


From the last equation in 7, the last equation in 4, and the fact that x, — x,—1 =, we obtain 
Anh? + by th? + enh dnt =Yn (9) 
2. S(x)} is continuous on [x1, Xy]. 
Because S{x) is continuous for x1 <x < xy, it follows that at each point x; in the set x3, x3, .... X,—1 we must have 
Sy-1 (47) =Sj(x;),  §2=2,3,..,2-1 (10) 


Otherwise, the graphs of S;_1 (x) and S;(x} would not join together to form a continuous curve at X;. When we apply the 
interpolating property §;(x,;) = y;, it follows from 10 that §;_, (x;) =y,,'= 2, 3,...,#— 1, or from 4 that 


ah? + byh? +cejyk2+dy = 2 
agh? + bah? + coh +42 = ¥3 (11) 
h? + by_gh? h+dy2 = 
By—2 + Oy—Qh” bh Cy—2# Fay-2 = Yn-l 
3. S" (x) is continuous on (x1, X»]- 
Because S" (x) is continuous for x1 <x <2y, it follows that 
Sy (xy) = Si(x;), i=2,3,..,n—1 
or, from 5, 
3a,h? +2ak+e, = c2 
3agh? + 2bok+e2z = 03 (12) 
3ay—gh? + 2by ph + ey-2 = Cy-] 
4. S"(x) is continuous on (x1, x2]. 
Because 5“ (x) is continuous for x1 <x <Xy, it follows that 
sy (xy) =f. (x;), 2=2,3,..,2=—1 
or, from 6, 
6ajk+2b, = 2b) 
6agh+- 2b, = 2b3 (13) 


6ayph + 2by-2 = 2by_1 


Equations 8, 9, 11, 12, and 13 constitute a system of 4 — 6 linear equations in the 4, — 4 unknown coefficients @;, b;, Cj, d;, 
i= 1, 2,...,%— 1. Consequently, we need two more equations to determine these coefficients uniquely. Before obtaining these 
additional equations, however, we can simplify our existing system by expressing the unknowns @;, b,, Cj, and @; in terms of 


new unknown quantities 
M,=S"(x1),  M2=S"(x2),... Mn =S" (an) 


and the known quantities 


¥1,¥2,--+¥n 
For example, from 6 it follows that 

My, = 2b 

My = aby 


sO 


b= 5M, b= 


Moreover, we already know from 8 that 

di=yi, d2=y2.--. dy =Yn-1 
We leave it as an exercise for you to derive the expressions for the @;'s and c;'s in terms of the J,'s and y,'s. The final result is 
as follows: 


THEOREM 10.4.1 Cubic Spline Interpolation 


Given n points (x1, ¥1), (%2, ¥2),--. (Xn, Yn) with x34, —x; =h, i= 1, 2, ..., 2 — 1, the cubic spline 
ay(x— x1)? +21 —21)? Fe1@—x1) +41, xy SxSxq 


S(x) = an(x — x2)? + bo(x —x2)* + ¢2(x — x2) +42, X22X <3 


3 2 Sp oa 
@y—1(X —Xy—-1)” + By (% — Xy-1)° Hen —X%y-1) Hay, Xy-1 SX =H Xy 


that interpolates these points has coefficients given by 
ay = (M41 — Mj) / 6h 


b =M;/2 14) 
ey = Osi yi) fh (C41 + 2M) 6] 
dj =y; 


fori= 1, 2,....%—1, where M; =S"(x,),i=1, 2, ..., 2. 


From this result, we see that the quantities 44;, M4, ..., 44, uniquely determine the cubic spline. To find these quantities, we 
substitute the expressions for @;, b,;, and cy given in 14 into 12. After some algebraic simplification, we obtain 


M1 +4M@o4+M3 = 6(y1—2y24+y9) th? 
Mz+4M3+M4 6(y2 —2y34+y4) fh? (15) 


My2+4Myiit+ My = 60-2-2¥n-1 + ¥n) th? 


or, in matrix form, 


o 
re 


Oo 
Oo 


— © 


o 


Oo 
Oo 


— 


oO 


— 


0]| 42 Yi- 22+ ¥3 
o|| 43 ya—2y3+y4 
o|| M4 : ¥3— 2yatys 
0 vs He | 
n-3 Yn—4— 2¥n-3 + Yn-2 
My-2 Yn—-3— 2¥n-2 + Yn-1 
: My-1 Yn-2— 2¥n-1 + ¥n 


This is a linear system of » — 2 equations for the nm unknowns Af1, Af, ..., 44). Thus, we still need two additional equations to 
determine A4;, f3, ..., Af, uniquely. The reason for this is that there are infinitely many cubic splines that interpolate the 
given points, so we simply do not have enough conditions to determine a unique cubic spline passing through the points. We 
discuss below three possible ways of specifying the two additional conditions required to obtain a unique cubic spline through 
the points. (The exercises present two more.) They are summarized in Table 1. 


Table 1 
Natural The second M,=0 410.000 M, ¥)— 272 +93 
Spline derivative of the M,=0 i141 000 M. ¥p—2ya4+Yq 
spline is zero at the ce 8 Doro: - {[_6 nanan 
endpoints. he 
Pe 00 0 1 4 1 |] Mn. Yn2-2YnitYn 
000 01 4 |) Miu 
The spline reduces 5 10 00 0 M, y;—2y2 +3 
to a parabolic curve 14] 00 0 M ¥>—2y, + V4 
on the first and last Do: _ 6 oO 
intervals. he : 
72 ; ; Yn-2— Yn t Yn 
0 0 ¢ 5 || M,. 
The spline is a M,=2M,-M, 600 00 0 M; ¥)—2y +73 
single cubic curve M,, = 2M), -My-> i41-000 M. - 7 ay . \ 
on the first two and . #4 eis mis _ 6 Y2— <¥3 + Va 
last two intervals. 000" 144 M,.» vy 5. . 
i. Yn-2 — =Yn-1 T Yn 
000+ 00 611M,, 


The Natural Spline 


The two simplest mathematical conditions we can impose are 


1 
1 
0 


0 
0 


0 


0 
1 
4 


0 
0 


0... 0 
0... 0 
1 2.0 
0... 1 
0... 0 


M,=M,,=0 
These conditions together with 15 result in an » x » linear system for 441, 449, -... 4£,,, which can be written in matrix form as 

0 o]| 4 2 

0 of] Ma Fi = seat 73 

0 Oj] M3 |_ 6] yo-2y3+yq 

i H 2? i 

401) My Yn-2— 2¥n-1 + ¥n 

0 My 0 


For numerical calculations it is more convenient to eliminate Af, and A,, from this system and write 


4100..000]| #2 yi-2y2+y3 


1410..00 0] 43 ya—2y3 +74 
M -2 
0000... 1 4 11) My-2 Yn-3 — 2Yn-2 + Yn-1 
ie ee My-1 Yn-2- 2¥n-1+¥n 
together with 
M,=0 (17) 
M,=0 (18) 


Thus, the (# — 2) x (% — 2) linear system can be solved for the » — 2 coefficients M3, M3, .... My_1, and My and M,, are 
determined by 17 and 18. 


Physically, the natural spline results when the ends of a drafting spline extend freely beyond the interpolating points without 
constraint. The end portions of the spline outside the interpolating points will fall on straight line paths, causing S “i (x) to 


vanish at the endpoints x1 and x» and resulting in the mathematical conditions Af; = M,, = 0. 


The natural spline tends to flatten the interpolating curve at the endpoints, which may be undesirable. Of course, if it is required 
that S’ (x) vanish at the endpoints, then the natural spline must be used. 


The Parabolic Runout Spline 


The two additional constraints imposed for this type of spline are 


M,=M2 (19) 


My = My-1 (20) 


If we use the preceding two equations to eliminate Af; and A, from 15, we obtain the (7 — 2) x (# — 2) linear system 


5100..000]| yi— 242 +3 

1410..00 0]| 43 ya—- 23 +74 

0141 00 0]] 6 a4 4 

PPG bob ot : ~ pe ™ A - (21) 
0000... 14 1) Myo Yn-3 — 2yn-2+Yn-1 


o 
So 
o 
oS 
o 
— 


My-1 Yn-2— 2¥n-1 + ¥n 
for M3, M3, ..., M4fy,_1. Once these » — 2 values have been determined, Af, and AM, are determined from 19 and 20. 
From 14 we see that Af; = M3 implies that 2; = 0, and M,, = M,,_; implies that g,,_; = 0. Thus, from 3 there are no cubic 


terms in the formula for the spline over the end intervals [x1, x2] and [xy 1, x»,]. Hence, as the name suggests, the parabolic 
runout spline reduces to a parabolic curve over these end intervals. 


The Cubic Runout Spline 


For this type of spline, we impose the two additional conditions 


M,=2M,-—M; (22) 


My =2My-1 — My-2 (23) 


Using these two equations to eliminate Jf, and M,,, from 15 results in the following (# — 2) x (% — 2) linear system for 
M2, M3, so? My: 


6000..00 01) #% vim {ity3 
1410 0 0 ol] 43 y2—-2y3 +4 
M —2y4+ 
ae’ ae - 7 ¥3 fbi (24) 
0000... 1 4 1)) My» Yn-3 — 2¥n-2 + Yn-1 
DO ce D8 Med Yn-2 = 2n-1 + Yn 


After we solve this linear system for Af3, Mf3, .... 4f,-1, we can use 22 and 23 to determine Af; and My. 


” 


If we rewrite 22 as 

Myz—-M,=M3z-—M2 
it follows from 14 that 4] = @2. Because §"" (x) = 6a, on [x1, x2] and S!"(x) = 6a on [x3, x3], we see that S’" (x) is 
constant over the entire interval [x1, x3]. Consequently, S{x)} consists of a single cubic curve over the interval [x , x3] rather 
than two different cubic curves pieced together at 2. [To see this, integrate S'" (x) three times.] A similar analysis shows that 
Sx} consists of a single cubic curve over the last two intervals. 


Whereas the natural spline tends to produce an interpolating curve that is flat at the endpoints, the cubic runout spline has the 
opposite tendency: it produces a curve with pronounced curvature at the endpoints. If neither behavior is desired, the parabolic 
runout spline is a reasonable compromise. 


EXAMPLE 1 Using a Parabolic Runout Spline <4 


The density of water is well known to reach a maximum at a temperature slightly above freezing. Table 2, from 
the Handbook of Chemistry and Physics (CRC Press, 2009), gives the density of water in grams per cubic 
centimeter for five equally spaced temperatures from —{ oc to 30°C. We will interpolate these five 
temperature—density measurements with a parabolic runout spline and attempt to find the maximum density of 
water in this range by finding the maximum value on this cubic spline. In the exercises we ask you to perform 
similar calculations using a natural spline and a cubic runout spline to interpolate the data points. 


Table 2 


Temperature (°C) | Density (g/em*) 


99815 
.I998T 


99973 
99823 
99567 


Set 


xj= —-10, yy =.99815 
x2= 0, ye =.99987 
xz= 10, ys =.99973 
x4= 20, ya =.99823 
x5= 30, ys =.99567 
Then 
6[y1—2yatya] fh? = —.0001116 
6[y2—2yat+ya] fh? = —.0000816 
6[y3—2y4tys] fh? = —.0000636 
and the linear system 21 for the parabolic runout spline becomes 
5 1 0]| 42 —.0001116 
1 4 1]| 443 |=] —.0000816 
01 5]| M4 —.0000636 


Solving this system yields 
Mz= —.00001973 
M3= —.00001293 
Mag= —.00001013 
From 19 and 20, we have 
My, = M z= —.00001973 
Ms=M4= —.00001013 
Solving for the @;'s, ;'s, ¢;'s, and @,'s in 14, we obtain the following expression for the interpolating parabolic 


runout spline: 


—.00000987(x + 10)? + 0002707 (x + 10) +.99815, —10<x<0 
.000000113(x—0)? =.00000987(x — 0)? + .0000733(x —0) +.99987, O<x<10 
.000000047(x — 10)3 —.00000647(x — 10)? — .0000900(x — 10) +.99973,  10<x<20 
—.00000507(x — 20)? — .0002053(x — 20) +.99823, 20<x<30 


This spline is plotted in Figure 10.4.6. From that figure we see that the maximum is attained in the interval 
[0, 10]. To find this maximum, we set S" (x) equal to zero in the interval [0, 10]: 
S'(x) = .000000339x7 — .0000197x + 0000733 =0 


To three significant digits the root of this quadratic in the interval [0, 10] is x = 3,99, and for this value of x, 
S(3.99) = 1.00001. Thus, according to our interpolated estimate, the maximum density of water is 

1.00001 g? cm? attained at 399°C. This agrees well with the experimental maximum density of 

1.00000 g! cm/ attained at 392°C. (In the original metric system, the gram was defined as the mass of one cubic 


centimeter of water at its maximum density.) 


1.00000 


0.99900 


0.99800 


0.99700 


Density (g/cm?) 


0.99600 


0.99500 
10 0 10 20 30 


Temperature (°C) 


Figure 10.4.6 


Closing Remarks 


In addition to producing excellent interpolating curves, cubic splines and their generalizations are useful for numerical 
integration and differentiation, for the numerical solution of differential and integral equations, and in optimization theory. 


Exercise Set 10.4 


1. Derive the expressions for @; and ¢; in Equations 14 of Theorem 10.4.1. 

2. The six points 
(0, .00000), (.2,.19867), (4, 38942), 
(.6, 56464), (.8,.71736), (1.0, .84147) 


lie on the graph of y = sin x, where x is in radians. 


(a) Find the portion of the parabolic runout spline that interpolates these six points for 4 < x < .6. Maintain an accuracy of 
five decimal places in your calculations. 


(b) Calculate S(.5) for the spline you found in part (a). What is the percentage error of S(.5) with respect to the “exact” 
value of sin(.5) = .47943? 


Answer: 


(a) S(x) = —.12643(x — 4)? = .20211(x = 4)? + .92158(x — 4) + .38942 
(b) S(.5) = 47943; error = 0% 
3. The following five points 


(0, 1), (1, 7), (2, 27), (3, 79), (4, 181) 


lie on a single cubic curve. 


(a) Which of the three types of cubic splines (natural, parabolic runout, or cubic runout) would agree exactly with the single 
cubic curve on which the five points lie? 


(b) Determine the cubic spline you chose in part (a), and verify that it is a single cubic curve that interpolates the five points. 


Answer: 


(a) The cubic runout spline 


(b) S(x) = 3x2 — 2x? 45x41 


4. Repeat the calculations in Example | using a natural spline to interpolate the five data points. 


Answer: 
— .00000042(x + 10)7 + .000214(x+10) + .99815, —10<x<0 
es 00000024(x)7 = 0000126(x)7 +  .000088(x) + .99987, O<x<10 
s)= 
— 00000004(x—10)7 — o000054(x—10)2 — 000092(x—10) + 99973, 10<x<20 
.00000022(x 20)? — .0000066(x—20)? — .000212(x—20) + .99823, 20<x<30 


Maximum at (x, S(x)) = (3.93, 1.00004) 


5. Repeat the calculations in Example | using a cubic runout spline to interpolate the five data points. 


Answer: 
OOOO0009 Cx 4+ 10)? = .0000121(x + 10)? + .000282(%+10) + .99815, -—10<x<0 
5 -00000009(x)7 = .0000093(x)? + .000070(x) + 99987, O0<x<10 
si= 
00000004(x 10)? — 0000066(x—10)7 — 000087(x—10) + .99973, 10<x<20 
00000004 (x = 20)3 = .0000053(x = 20)? = O00207(x=—20) + (99823, 20<x<30 


Maximum at (x, S(x))}) = (4.00, 1.00001) 

6. Consider the five points (0, 0), (.5, 1), (1, 0), (1.5, — 1), and (2, 0) on the graph of y = sin(ax). 
(a) Use a natural spline to interpolate the data points (0, 0), (.5, 1), and (1, 0). 
(b) Use a natural spline to interpolate the data points (.5, 1), (1, 0), and (1.5, — 1). 


(c) Explain the unusual nature of your result in part (b). 


Answer: 


(a) —4x7 4 3x 0<x<0.5 
S(x) = . 
Ax? — 12x*°+9x—1 05<x<1 


(b) 2—2x 05<x<1 
AM = 
(x) fae 1<x<15 


(c) The three data points are collinear. 


7. (The Periodic Spline) If it is known or if it is desired that the n points (x1, ¥1), (%2, ¥2), -... (X», yp) to be interpolated lie 
on a single cycle of a periodic curve with period X, — 1, then an interpolating cubic spline Sx} must satisfy 


S11) =S@n) 
S'(x1) =S"(ap) 
S"G1) =S" Gn) 
(a) Show that these three periodicity conditions require that 
Y1 = Yn 
M, = My 


4M 1+ M+ My. = 60%n-1— 291 +2) 1? 


(b) Using the three equations in part (a) and Equations 15, construct an (7% — 1) x (# — 1) linear system for 
My, M3, .... @y_1 in matrix form. 


Answer: 

b = a 

(b) 4100-+-+0001 My Yn-l 2y1 + Y2 
1410 000 o|| 2 1 = 2y2 + ¥3 
0141 000 0 M3 _ 6 i ied 2y3 4 v4 

: : i ne i 

0000 014 1)) My-2 Yn-3 — 2¥n-2 + Ont 
1000 v0.1 4 My-1 Yn-2 —— 2Yn-1 t Bd | 


8. (The Clamped Spline) Suppose that, in addition to the n points to be interpolated, we are given specific values YI and yh for 
the slopes S$" (x 1) and S' (xp) of the interpolating cubic spline at the endpoints *1 and xy. 


(a) Show that 
2M@1+Mz = 6(%2-y1—hyt) fh? 
2Mn+Mn1 = 60n-1—Ynthyh) th? 


(b) Using the equations in part (a) and Equations 15, construct an »; x », linear system for Af;, Af, .... 44, in matrix form. 


Remark The clamped spline described in this exercise is the most accurate type of spline for interpolation work if the 
slopes at the endpoints are known or can be estimated. 


Answer: 

(b) -_ i 
>100-:. 0001] 1 hy, Yi + ¥2 
1410 000 Mo 1 = 2y2 + 3 
0141 000 0] 43 /_ 6 3 = 2y3 + ya 

ii he : 
0000 004 1]/ My; Yn-2 —)— n-t OY n 
0000 0112 
My Yn-l1 7 Yn | hyp 


Section 10.4 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, 
Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some 
linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are 
using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have 
mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the 
regular exercise sets. 


T1. In the solution of the natural cubic spline problem, it is necessary to solve a system of equations having coefficient matrix 


410..00 0 
141..0 0 0 
A,=J]i to tos bo boi 
0o00..14 1 
0o00..0 14 
If we can present a formula for the inverse of this matrix, then the solution for the natural cubic spline problem can be easily 
obtained. In this exercise and the next, we use a computer to discover this formula. Toward this end, we first determine an 


expression for the determinant of A,,, denoted by the symbol 2,,. Given that 


‘Ap TAl aed =|! dl 


we see that 
Dy = det(Ay) = det [4] =4 


and 


Dy = det(Ag) = at‘ i =15 


(a) Use the cofactor expansion of determinants to show that 
Dy = 4Dy_-1 — Dy-2 
for x = 3, 4, 5, .... This says, for example, that 
D3 =4)D3— 2; =4(15) —4 = 56 
D4=4)D3 — Pz = 4(56) — 15 = 209 
and so on. Using a computer, check this result for 5 < » < 10. 
(b 


wm 


By writing 
Dy =4Dy-1 — Dy-2 


and the identity, D,,_; = D,,_}, in matrix form, 


Feel tars [s 
Frei ehrae | cae 


(c) Use the methods in Section 5.2 and a computer to show that 
n—-l n—l n—2 n—2 
(2493) —(2-y3) (2-y3) - (24 ¥3) 
n—2 n—2 n—3 n—3 
Fi rl (2+ 73) -(2-y3) (2-93) - (2+ ¥3) 


show that 


1 0 
and hence 
+1 +1 
7 (2+ 73) -@-43)" 
n= 
2y'3 
for x = 1, 2, 3,.... 


(d) Using a computer, check this result for ] << 10. 


T2. In this exercise, we determine a formula for calculating a from ); fork =0, 1, 2, 3, ..., %, assuming that Dg is defined 
to be 1. 


(a) Use a computer to compute Ay 1 fork = 1, 2, 3, 4, and 5. 
(b) From your results in part (a), discover the conjecture that 
a | 
A, = [ayy] 
where ij = ji and 


el Deseica 
ay (=n (Daher) 


fori <j. 


(c) Use the result in part (b) to compute Ay 1 and compare it to the result obtained using the computer. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.5 Markov Chains 


In this section we describe a general model of a system that changes from state to state. We then apply the model 
to several concrete problems. 


Prerequisites 


Linear Systems 
Matrices 


Intuitive Understanding of Limits 


A Markov Process 


Suppose a physical or mathematical system undergoes a process of change such that at any moment it can occupy 
one of a finite number of states. For example, the weather in a certain city could be in one of three possible 
states: sunny, cloudy, or rainy. Or an individual could be in one of four possible emotional states: happy, sad, 
angry, or apprehensive. Suppose that such a system changes with time from one state to another and at scheduled 
times the state of the system is observed. If the state of the system at any observation cannot be predicted with 
certainty, but the probability that a given state occurs can be predicted by just knowing the state of the system at 
the preceding observation, then the process of change is called a Markov chain or Markov process. 


DEFINITION 1 


If a Markov chain has k possible states, which we label as 1, 2, ..., &, then the probability that the system 
is in state i at any observation after it was in state j at the preceding observation is denoted by Pi and is 
called the transition probability from state j to state i. The matrix P = [ Py] is called the transition 


matrix of the Markov chain. 


For example, in a three-state Markov chain, the transition matrix has the form 
Preceding State 
1 2 3 


Pil Pi2 P13 1 
P21 P22 P23) 2 New State 
P31 P32 P33 3 


In this matrix, #32 is the probability that the system will change from state 2 to state 3, 11 is the probability that 
the system will still be in state 1 if it was previously in state 1, and so forth. 


EXAMPLE 1 Transition Matrix of the Markov Chain 


A car rental agency has three rental locations, denoted by 1, 2, and 3. A customer may rent a car 
from any of the three locations and return the car to any of the three locations. The manager finds 
that customers return the cars to the various locations according to the following probabilities: 


Rented from Location 
i 2.3 
ee ae 1 Returned 
sl. fae. a 2 to 
ok 23 vd 3 Location 


This matrix is the transition matrix of the system considered as a Markov chain. From this matrix, 
the probability is 6 that a car rented from location 3 will be returned to location 2, the probability 
is & that a car rented from location 1 will be returned to location 1, and so forth. 


EXAMPLE 2 Transition Matrix of the Markov Chain <@ 


By reviewing its donation records, the alumni office of a college finds that 80% of its alumni who 
contribute to the annual fund one year will also contribute the next year, and 30% of those who do 
not contribute one year will contribute the next. This can be viewed as a Markov chain with two 
states: state 1 corresponds to an alumnus giving a donation in any one year, and state 2 corresponds 
to the alumnus not giving a donation in that year. The transition matrix is 


Pol 


In the examples above, the transition matrices of the Markov chains have the property that the entries in any 
column sum to 1. This is not accidental. If ? = [ Py] is the transition matrix of any Markov chain with & states, 
then for each j we must have 


Pij + Pa +-..-4 Pi =! (1) 


because if the system is in state 7 at one observation, it is certain to be in one of the & possible states at the next 
observation. 


A matrix with property | is called a stochastic matrix, a probability matrix, or a Markov matrix. From the 
preceding discussion, it follows that the transition matrix for a Markov chain must be a stochastic matrix. 


In a Markov chain, the state of the system at any observation time cannot generally be determined with certainty. 
The best one can usually do is specify probabilities for each of the possible states. For example, in a Markov 


chain with three states, we might describe the possible state of the system at some observation time by a column 
vector 


x1 
x=] 42 
x3 


in which * 1 is the probability that the system is in state 1, X2 the probability that it is in state 2, and x3 the 
probability that it is in state 3. In general we make the following definition. 


DEFINITION 2 
The state vector for an observation of a Markov chain with k states is a column vector x whose ith 


component %; is the probability that the system is in the ith state at that time. 


Observe that the entries in any state vector for a Markov chain are nonnegative and have a sum of 1. (Why?) A 
column vector that has this property is called a probability vector. 


Let us suppose now that we know the state vector x for a Markov chain at some initial observation. The 


following theorem will enable us to determine the state vectors 


at the subsequent observation times. 


THEOREM 10.5.1 


If P is the transition matrix of a Markov chain and ) is the state vector at the nth observation, then 


Hl) _ p,@, 


The proof of this theorem involves ideas from probability theory and will not be given here. From this theorem, 
it follows that 


x) — pO 

x2 — py — p2,O 
x9 — Py® — pA, O 
x — Py@-) _ pr, ® 


In this way, the initial state vector x) and the transition matrix P determine x“ for » = 1, 2, .... 


EXAMPLE 3 Example 2 Revisited <4 


Pala 


We now construct the probable future donation record of a new graduate who did not give a donation in the 


The transition matrix in Example 2 was 


initial year after graduation. For such a graduate the system is initially in state 2 with certainty, so the initia 
vector is 


From Theorem 10.5.1 we then have 
() _ pO) _ 8 3/0 = fe 
ee E ;| ; q 
 — py) — 8 34] .3 - 45 
. E 7.7] 7 |.55 
@ _ py@_ 8 3} .45 = 525 
* E 7\[.55] 7 |.475 
Thus, after three years the alumnus can be expected to make a donation with probability .525. Beyond thre 
years, we find the following state vectors (to three decimal places): 


(4 _ | 563 (5) _ | 581 © _|.°91 M — 
. al = be ~~ | 409 |’ - 
(e 598 ) _ 599 (10) _ 599 (11) _ 
* Bal = we . 401 |’ as 


For all n beyond 11, we have 
() _ | .600 
xe = 
be 
to three decimal places. In other words, the state vectors converge to a fixed vector as the number of 
observations increases. (We will discuss this further below.) 


EXAMPLE 4 Example 1 Revisited << 


The transition matrix in Example 1 was 


oO. ao ve 
od oe ae 
es ae 
If a car is rented initially from location 2, then the initial state vector is 
0 
x=] 4 
0 


Using this vector and Theorem 10.5.1, one obtains the later state vectors listed in Table 1. 


Table 1 


For all values of n greater than 11, all state vectors are equal to x“) to three decimal places. 


Two things should be observed in this example. First, it was not necessary to know how long a customer k« 
the car. That is, in a Markov process the time period between observations need not be regular. Second, the 
state vectors approach a fixed vector as n increases, just as in the first example. 


EXAMPLE 5 Using Theorem 10.5.1 


A traffic officer is assigned to control the traffic at the eight intersections indicated in Figure 10.5.1. 
She is instructed to remain at each intersection for an hour and then to either remain at the same 
intersection or move to a neighboring intersection. To avoid establishing a pattern, she is told to 
choose her new intersection on a random basis, with each possible choice equally likely. For example, 


if she is at intersection 5, her next intersection can be 2, 4, 5, or 8, each with probability re Every day 


she starts at the location where she stopped the day before. The transition matrix for this Markov chain 
is 


Old Intersection 

1234567 8 

st 0t0000 
4400400 0 

oof 50400 : 
totttoltol | New 
0 ; 0 : ; 0 0 4 | re 
oof 00s 0 ; 

so odod dy 
oo00s 074 


{>> 


SILL 
JOOL 


a = « 


Figure 10.5.1 
If the traffic officer begins at intersection 5, her probable locations, hour by hour, are given by the 
state vectors given in Table 2. For all values of n greater than 22, all state vectors are equal to x@4) to 


three decimal places. Thus, as with the first two examples, the state vectors approach a fixed vector as 
n increases. 


Table 2 


000 
250 
000 


.250 
250 
000 
000 
250 


Limiting Behavior of the State Vectors 


In our examples we saw that the state vectors approached some fixed vector as the number of observations 
increased. We now ask whether the state vectors always approach a fixed vector in a Markov chain. A simple 
example shows that this is not the case. 


EXAMPLE 6 System Oscillates Between Two State Vectors 


tl at 


Then, because P? — 7 and p? — P, we have that 


Let 


=x =20 =.= 19) 


and 


Oa =x ==])] 


This system oscillates indefinitely between the two state vectors fa and ki 


il so it does not 


approach any fixed vector. 


However, if we impose a mild condition on the transition matrix, we can show that a fixed limiting state vector is 
approached. This condition is described by the following definition. 


DEFINITION 3 


A transition matrix is regular if some integer power of it has all positive entries. 


Thus, for a regular transition matrix P, there is some positive integer m such that all entries of P”™ are positive. 
This is the case with the transition matrices of Examples | and 2 for », = 1. In Example 5 it turns out that P4 has 
all positive entries. Consequently, in all three examples the transition matrices are regular. 


A Markov chain that is governed by a regular transition matrix is called a regular Markov chain. We will see 
that every regular Markov chain has a fixed state vector q such that Pp”) approaches g as n increases for any 


choice of x). This result is of major importance in the theory of Markov chains. It is based on the following 
theorem. 


THEOREM 10.5.2 Behavior of P” as 2 — co 


If P is a regular transition matrix, then as » — a9, 


Gi G1 --- FI 
p_, G2 42 --- 42 
Gi Fic --- Fie 
where the 4; are positive numbers such that gj ++ ¢@3 +... ¢, = 1. 


We will not prove this theorem here. We refer you to a more specialized text, such as J. Kemeny and J. Snell, 
Finite Markov Chains (New York: Springer-Verlag, 1976). 


Let us set 


Gi Gi --- Fi F1 


GF G7. sc G2 
G=| and q= 


Gk Gk --- Gh Fk 


Thus, Q is a transition matrix, all of whose columns are equal to the probability vector g. QO has the property that 
if x is any probability vector, then 


G1 91 --- G1 |[xy GiX4 t G 1X2 F... 4 F1X ke 
Gee . "2 a ie _ - si t ike F... 4 ae 
ak qk -- qk Xk TeX + eX? +...+ aX 
G1 
= (xy tx94...4+24)| 7? |=()a=a 
qk 


That is, Q transforms any probability vector x into the fixed probability vector q. This result leads to the 
following theorem. 


THEOREM 10.5.3 Behavior of P”x as a — 0O 


If P is a regular transition matrix and x is any probability vector, then as » —, a0, 
#1 


where q is a fixed probability vector, independent of n, all of whose entries are positive. 


This result holds since Theorem 10.5.2 implies that P” —+ Cas » — oo. This in turn implies that P"x + Ox = q 


as » —s oo. Thus, for a regular Markov chain, the system eventually approaches a fixed state vector q. The vector 
q is called the steady-state vector of the regular Markov chain. 


For systems with many states, usually the most efficient technique of computing the steady-state vector q is 
simply to calculate Px for some large n. Our examples illustrate this procedure. Each is a regular Markov 
process, so that convergence to a steady-state vector is ensured. Another way of computing the steady-state 
vector is to make use of the following theorem. 


THEOREM 10.5.4 Steady-State Vector 


The steady-state vector q of a regular transition matrix P is the unique probability vector that satisfies the 
equation Pq = q. 


To see this, consider the matrix identity Pp” — p*+!. By Theorem 10.5.2, both P” and p*+! approach O as 

» —- oo. Thus, we have PO = @. Any one column of this matrix equation gives Pq = q. To show that q is the 
only probability vector that satisfies this equation, suppose r is another probability vector such that Py — y. Then 
also P"y = r for x = 1, 2, .... When we let » —, ao, Theorem 10.5.3 leads to q=r. 


Theorem 10.5.4 can also be expressed by the statement that the homogeneous linear system 

=-P)q=9 
has a unique solution vector q with nonnegative entries that satisfy the condition g1 ++ ¢3 +... ++ ¢x% = 1. We can 
apply this technique to the computation of the steady-state vectors for our examples. 


EXAMPLE 7 Example 2 Revisited <4 


8 .3 
P=|° 
2 
so the linear system (7 — P}q = 0 is 


= “5 Ml¢2|- A 2) 


This leads to the single independent equation 


In Example 2 the transition matrix was 


.2g1 — .3¢2=0 
or 
41 = 1.5¢2 


Thus, when we set ¢2 = 5, any solution of 2 is of the form 


uk 


where s is an arbitrary constant. To make the vector q a probability vector, we set 
s=1/(1.5+ 1) =-4. Consequently, 
_ | .6 
wl 4 


is the steady-state vector of this regular Markov chain. This means that over the long run, 60% of 
the alumni will give a donation in any one year, and 40% will not. Observe that this agrees with the 
result obtained numerically in Example 3. 


EXAMPLE 8 Example 1 Revisited << 


In Example | the transition matrix was 


LA bo 
MAD 


so the linear system (/ — P}q = 0 is 


2 <3 =—2]|/91 0 
-—1 8 =—6/]|/¢2/=]0 
=—1 =—5 8} 43 0 


The reduced row echelon form of the coefficient matrix is (verify) 


34 
1 0-4 
14 
01-5 
00 oO 


so the original linear system is equivalent to the system 


When we set ¢3 = §, any solution of the linear system is of the form 


24 
13 


34 


alee ke 
q=| = | =} .2295... 
ice 


This agrees with the result obtained numerically in Table 1. The entries of q give the long-run 
probabilities that any one car will be returned to location 1, 2, or 3, respectively. If the car rental 
agency has a fleet of 1000 cars, it should design its facilities so that there are at least 558 spaces at 
location 1, at least 230 spaces at location 2, and at least 214 spaces at location 3. 


EXAMPLE 9 Example5 Revisited <@ 


We will not give the details of the calculations but simply state that the unique probability vector 
solution of the linear system (/ = P)q = 0 is 


28 
3. 
1071... 
38 1071... 
5 1071... 
_ | 28 | _ | .1785... 
1) 4 | | 1428... 
28 1071... 
es 1428... 
28 1071... 
4A 
28 
2: 
28 


The entries in this vector indicate the proportion of time the traffic officer spends at each 
intersection over the long term. Thus, if the objective is for her to spend the same proportion of 
time at each intersection, then the strategy of random movement with equal probabilities from one 
intersection to another is not a good one. (See Exercise 5.) 


Exercise Set 10.5 


1. Consider the transition matrix 


1 
(a) Calculate x@ for » = 1, 2, 3,4, 5 i¢x® = HI 


(b) State why P is regular and find its steady-state vector. 


Answer: 
(a) ,a)_| 4 (2) __ | 46 (3) _ | .454 (4 _ | 4546 ( _ [45454 
= Pe aa 54| * 546 |" * 5454 |" * 54546 
(b) 5 
P is regular since all entries of P are positive; q = im 
qi 
2. Consider the transition matrix 
ae ae | 
P=|6 4 2 
oo al 


(a) Calculate x“), ,@, and x@ to three decimal places if 


0 
xO) — 0 
1 


(b) State why P is regular and find its steady-state vector. 


Answer: 
(a) if 23 273 
x) =| 2), x=! 52], x =| 396 
1 25 331 
(b) 22 
72 
P is regular, since all entries of P are positive: q = 8 
PAu 
72 
3. Find the steady-state vectors of the following regular transition matrices: 
(a) | 1 3 
3 4 
21 
3 4 
(b) | .81 .26 
19 74 
(ji 1 
a2 : 
A a 
3 : 4 
pee ee | 
3 2 4 
Answer 
(a) | 3 
1? 
3. 
1? 
(b) | 26 
45 
19 


(c) 


a - 
IS SIF Sy 


— 


9 


4. Let P be the transition matrix 


(a) Show that P is not regular. 

(b) Show that as n increases, Px) approaches | for any initial state vector ,). 

(c) What conclusion of Theorem 10.5.3 is not valid for the steady state of this transition matrix? 
Answer: 


; (3) 0 


= , #=1,2,.... Thus, no integer power of P has all positive entries. 
0 0 0 
(b) Pp? _, k i as n increases, so ? ™O _, i] for any x) as n increases. 


(©) The entries of the limiting vector H are not all positive. 


5. Verify that if P is a & x & regular transition matrix all of whose row sums are equal to 1, then the entries of its 
steady-state vector are all equal to | / x. 


6. Show that the transition matrix 


1d 
eae 
=) bet 
P=|5 5 0 
Lee 
2 °°3 


is regular, and use Exercise 5 to find its steady-state vector. 


Answer: 


p? — 


has all positive entries; q = 


Ble Ble ple 
Mle Ble Blo 


7. John is either happy or sad. If he is happy one day, then he is happy the next day four times out of five. If he 
is sad one day, then he is sad the next day one time out of three. Over the long term, what are the chances that 
John is happy on any given day? 


Answer: 


10 
13 

8. A country is divided into three demographic regions. It is found that each year 5% of the residents of region 1 
move to region 2, and 5% move to region 3. Of the residents of region 2, 15% move to region | and 10% 
move to region 3. And of the residents of region 3, 10% move to region 1 and 5% move to region 2. What 
percentage of the population resides in each of the three regions after a long period of time? 


Answer: 


542% in region 1, 162% in region 2, and 297% in region 3 


Section 10.5 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic 
proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be 
able to use your technology utility to solve many of the problems in the regular exercise sets. 


T1. Consider the sequence of transition matrices 
{P2, £3, Pa, --.} 
with 


1 

+1 00 4 

Py = ae P3 a ae 
: af 2 3/ 

2 ,i 3 

2 3 

00004 

0004 2 
ee oo 0c 2 

3 4 1141 
an eee s. =| 02 2 st 
23 4 Pa ae ee 

ita 4 23 4 5 

23 4 i a 
234 5 


and so on. 

(a) Use a computer to show that each of these four matrices is regular by computing their squares. 

(b) Verify Theorem 10.5.2 by computing the 100th power of P; for k = 2, 3, 4, 5. Then make a conjecture as to 
the limiting value of Fart as » —s oo for all k = 2, 3, 4, ... 

(c) Verify that the common column 4% of the limiting matrix you found in part (b) satisfies the equation 
Pik = Qk, aS required by Theorem 10.5.4. 

T2. A mouse is placed in a box with nine rooms as shown in the accompanying figure. Assume that it is equally 

likely that the mouse goes through any door in the room or stays in the room. 

(a) Construct the 9 s 9 transition matrix for this problem and show that it is regular. 

(b) Determine the steady-state vector for the matrix. 


(c) Use a symmetry argument to show that this problem may be solved using only a 3 x 3 matrix. 


Figure Ex-T2 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.6 Graph Theory 


In this section we introduce matrix representations of relations among members of a set. We use matrix 
arithmetic to analyze these relationships. 


Prerequisites 


Matrix Addition and Multiplication 


Relations Among Members of a Set 


There are countless examples of sets with finitely many members in which some relation exists among 
members of the set. For example, the set could consist of a collection of people, animals, countries, 
companies, sports teams, or cities; and the relation between two members, A and B, of such a set could be that 
person A dominates person B, animal A feeds on animal B, country A militarily supports country B, company 
A sells its product to company B, sports team A consistently beats sports team B, or city A has a direct airline 
flight to city B. 


We will now show how the theory of directed graphs can be used to mathematically model relations such as 
those in the preceding examples. 


Directed Graphs 


A directed graph is a finite set of elements, {?;, P3, ....,} , together with a finite collection of ordered 
pairs (;, P;) of distinct elements of this set, with no ordered pair being repeated. The elements of the set are 
called vertices, and the ordered pairs are called directed edges, of the directed graph. We use the notation 

P; — P; (which is read “P; is connected to ;”) to indicate that the directed edge (/°;, P;) belongs to the 
directed graph. Geometrically, we can visualize a directed graph (Figure 10.6.1) by representing the vertices 
as points in the plane and representing the directed edge P; — P ; by drawing a line or arc from vertex P; to 
vertex /;, with an arrow pointing from P; to P;. If both Py —+ P; and P; — P; hold (denoted P;  P;), we 
draw a single line between P; and P; with two oppositely pointing arrows (as with P2 and P3 in the figure). 


Figure 10.6.1 


As in Figure 10.6.1, for example, a directed graph may have separate “components” of vertices that are 
connected only among themselves; and some vertices, such as Ps, may not be connected with any other 
vertex. Also, because P; —+ P; is not permitted in a directed graph, a vertex cannot be connected with itself by 
a single arc that does not pass through any other vertex. 


Figure 10.6.2 shows diagrams representing three more examples of directed graphs. With a directed graph 
having n vertices, we may associate an » x » matrix Mf = [233], called the vertex matrix of the directed 
graph. Its elements are defined by 


1; if Py + Ps 
Cc 
0, otherwise 
for i, j = 1, 2, ..., #. For the three directed graphs in Figure 10.6.2, the corresponding vertex matrices are 
010 0 
{9 010 
Figure a: M= 604 
000 0 
O10 O 1 
0011 0 
Figure b: M=|}00010 
0100 1 
0110 0 
010 0 
1010 
F ; M= 
ye 1001 
oe STi aa 8 


ny 


| P; 
P, P, 


(a) 


P; 
P; Py 
P, 
P; 


(5) 


P, 
P, 
P; 
P; 


(c) 
Figure 10.6.2 


By their definition, vertex matrices have the following two properties: 
(i) All entries are either 0 or 1. 
(ii) All diagonal entries are 0. 


Conversely, any matrix with these two properties determines a unique directed graph having the given matrix 
as its vertex matrix. For example, the matrix 


orosn 
cocoon 
COoOru 
ores 


determines the directed graph in Figure 10.6.3. 


P, 


P, 
P, 


Figure 10.6.3 


EXAMPLE 1 Influences Withina Family 


A certain family consists of a mother, father, daughter, and two sons. The family members have 
influence, or power, over each other in the following ways: the mother can influence the 
daughter and the oldest son; the father can influence the two sons; the daughter can influence 
the father; the oldest son can influence the youngest son; and the youngest son can influence the 
mother. We may model this family influence pattern with a directed graph whose vertices are 
the five family members. If family member A influences family member B, we write 4 —, 2. 
Figure 10.6.4 is the resulting directed graph, where we have used obvious letter designations for 
the five family members. The vertex matrix of this directed graph is 


MF DOS ¥S 
Misiood1 1 0 
FP 10 00 1 1 
P10 10 0 0 
OS }0 00 0 1 
YS |1 00 0 0 
M YS 
D F 


Figure 10.6.4 


EXAMPLE 2 Vertex Matrix: Moves onaChessboard 


In chess the knight moves in an “L”’-shaped pattern about the chessboard. For the board in 
Figure 10.6.5 it may move horizontally two squares and then vertically one square, or it may 
move vertically two squares and then horizontally one square. Thus, from the center square in 
the figure, the knight may move to any of the eight marked shaded squares. Suppose that the 
knight is restricted to the nine numbered squares in Figure 10.6.6. If by i + 7 we mean that the 
knight may move from square i to square /, the directed graph in Figure 10.6.7 illustrates all 


possible moves that the knight may make among these nine squares. In Figure 10.6.8 we have 
“unraveled” Figure 10.6.7 to make the pattern of possible moves clearer. 


The vertex matrix of this directed graph is given by 


00000101 0 
000000101 
000100010 
001000001 
M=|0 00000000 
1000001 0 0 
01000100 0 
1010000 0 0 
0101000 0 0 


Figure 10.6.7 


2 


Figure 10.6.8 


In Example | the father cannot directly influence the mother; that is, # —, jf is not true. But he can influence 
the youngest son, who can then influence the mother. We write this as ¥ —, ¥S'—, Af and call it a 2-step 
connection from F to M. Analogously, we call jf —. Ja I-step connection, F —. OS —. YS'— | a 3-step 
connection, and so forth. Let us now consider a technique for finding the number of all possible r-step 
connections (7 = 1, 2, ...) from one vertex ; to another vertex P ; of an arbitrary directed graph. (This will 
include the case when ?; and P j are the same vertex.) The number of 1-step connections from 7; to F jis 
simply #3). That is, there is either zero or one 1-step connection from ?; to ?;, depending on whether ”,; is 
zero or one. For the number of 2-step connections, we consider the square of the vertex matrix. If we let me 
be the (7, j)-th element of y¢2, we have 


me = msm; b 7925270895 bh... PR iy Py (1) 
Now, if #31 = #1; = 1, there is a 2-step connection ?; —+ P, — P; from P; to P;. But if either #31 or 1; is 
zero, such a 2-step connection is not possible. Thus P; —+ P| —+ #; is a 2-step connection if and only if 
70;1?@1; = 1. Similarly, for any k = 1, 2, ..., x, Py + Py, — P; is a 2-step connection from P; to P; if and 
only if the term ’;i??? ky on the right side of 1 is one; otherwise, the term is zero. Thus, the right side of] is 
the total number of two 2-step connections from P; to ?;. 


A similar argument will work for finding the number of 3 — , 4 —, ..., 7-step connections from P; to P;. In 
general, we have the following result. 


THEOREM 10.6.1 


Let M be the vertex matrix of a directed graph and let me be the (3, 7)-th element of Jf". Then me 
is equal to the number of r-step connections from P; to P i 


EXAMPLE 3 Using Theorem 10.6.1 


Figure 10.6.9 is the route map of a small airline that services the four cities P;, P3, Pz, P4. As 
a directed graph, its vertex matrix is 


We have that 


If we are interested in connections from city P4 to city P3, we may use Theorem 10.6.1 to find 
their number. Because 7243 = 1, there is one 1-step connection; because me — j, there is one 


2-step connection; and because @) — 2, there are three 3-step connections. To verify this, 


from Figure 10.6.9 we find 


1-step connections from P4 to P3: Pi P3 
2-step connections from P4 to P3: P43 P27 P3 
3-step connections from P4 to P3: P47 P34 Pg PG 


P43 P23 PP oo P3 
P47 P39 P| oo PY 


P, 


Figure 10.6.9 


Cliques 


In everyday language a “clique” is a closely knit group of people (usually three or more) that tends to 
communicate within itself and has no place for outsiders. In graph theory this concept is given a more precise 
meaning. 


DEFINITION 1 


A subset of a directed graph is called a clique if it satisfies the following three conditions: 
(i) The subset contains at least three vertices. 
(ii) For each pair of vertices P; and /”; in the subset, both Py — ?; and P; — P; are true. 


(iii) The subset is as large as possible; that is, it is not possible to add another vertex to the subset and 
still satisfy condition (ii). 


This definition suggests that cliques are maximal subsets that are in perfect “communication” with each other. 
For example, if the vertices represent cities, and P; — P } means that there is a direct airline flight from city 
P, to city P }> then there is a direct flight between any two cities within a clique in either direction. 


EXAMPLE 4 A Directed Graph with Two Cliques 


The directed graph illustrated in Figure 10.6.10 (which might represent the route map of an 
airline) has two cliques: 
Pit ief wea) and (P3, P4, Po} 


This example shows that a directed graph may contain several cliques and that a vertex may 
simultaneously belong to more than one clique. 


Ps 


P, 


P; 


Figure 10.6.10 


For simple directed graphs, cliques can be found by inspection. But for large directed graphs, it would be 
desirable to have a systematic procedure for detecting cliques. For this purpose, it will be helpful to define a 
matrix S = [s;;] related to a given directed graph as follows: 


1, if P; =) P; 
sj = 
0, otherwise 


The matrix S determines a directed graph that is the same as the given directed graph, with the exception that 
the directed edges with only one arrow are deleted. For example, if the original directed graph is given by 
Figure 10.6.11a, the directed graph that has S as its vertex matrix is given in Figure 10.6.115. The matrix S 
may be obtained from the vertex matrix M of the original directed graph by setting s;; = 1 if ye; j= my = 1 
and setting sj; = 9 otherwise. 


P, 


P, 


(a) 


> 


P; 
(b) 
Figure 10.6.11 


The following theorem, which uses the matrix S, is helpful for identifying cliques. 


THEOREM 10.6.2 Identifying Cliques 


Let a be the (7, )-th element of ¢7. Then a vertex , belongs to some clique if and only if a * (): 


Proof If 5 + 0, then there is at least one 3-step connection from ?, to itself in the modified directed graph 
22 

determined by S. Suppose it is ?; —- P; —+ Py — P;. In the modified directed graph, all directed relations are 

two-way, so we also have the connections P; «» Ps «» Px «» P;. But this means that {;, P;, Px} is either a 

clique or a subset of a clique. In either case, P; must belong to some clique. The converse statement, “if P; 

belongs to a clique, then a + 0,” follows in a similar manner. 


EXAMPLE 5 Using Theorem 10.6.2 


Suppose that a directed graph has as its vertex matrix 


0111 
1010 
M=lo 101 
100 0 
Then 
010 1 03 02 
_}|1 01 0 Ss |3 02.0 
S=lo 100 aoedaed (ir ee 
100 0 20 10 
Because all diagonal entries of 7 are zero, it follows from Theorem 10.6.2 that the directed 


graph has no cliques. 


EXAMPLE 6 Using Theorem 10.6.2 


Suppose that a directed graph has as its vertex matrix 


01011 

10010 

M=|1 101 0 

11000 

i 0:0. 4 9 

Then 

01011 24043 
10010 42031 
S=|lo 0000 and S=|00000 
11000 43021 
10000 31010 


The nonzero diagonal entries of s7 are a Be and ore Consequently, in the given directed 
graph, P;, P3, and P4 belong to cliques. Because a clique must contain at least three vertices, 


the directed graph has only one clique, {, P32, P4}. 


Dominance-Directed Graphs 


In many groups of individuals or animals, there is a definite “pecking order’ or dominance relation between 
any two members of the group. That is, given any two individuals 4 and B, either A dominates B or B 


dominates 4, but not both. In terms of a directed graph in which P; —» P; means P; dominates ?;, this means 
that for all distinct pairs, either ?; — P jor FP F ho P;, but not both. In general, we have the following 
definition. 


DEFINITION 2 


A dominance-directed graph is a directed graph such that for any distinct pair of vertices P; and ?;, 
either P; — P; or P; — P;, but not both. 


An example of a directed graph satisfying this definition is a league of n sports teams that play each other 
exactly one time, as in one round of a round-robin tournament in which no ties are allowed. If ?; —- P j Means 
that team P; beat team P ; in their single match, it is easy to see that the definition of a dominance-directed 
group is satisfied. For this reason, dominance-directed graphs are sometimes called tournaments. 


Figure 10.6.12 illustrates some dominance-directed graphs with three, four, and five vertices, respectively. In 
these three graphs, the circled vertices have the following interesting property: from each one there is either a 
1-step or a 2-step connection to any other vertex in its graph. In a sports tournament, these vertices would 
correspond to the most “powerful” teams in the sense that these teams either beat any given team or beat 
some other team that beat the given team. We can now state and prove a theorem that guarantees that any 
dominance-directed graph has at least one vertex with this property. 


THEOREM 10.6.3 Connections in Dominance-Directed Graphs 


In any dominance-directed graph, there is at least one vertex from which there is a l-step or 2-step 
connection to any other vertex. 


Proof Consider a vertex (there may be several) with the largest total number of 1-step and 2-step 
connections to other vertices in the graph. By renumbering the vertices, we may assume that P; is sucha 
vertex. Suppose there is some vertex ; such that there is no 1-step or 2-step connection from ?; to P;. Then, 
in particular, P; —» P, is not true, so that by definition of a dominance-directed graph, it must be that 

P; — P. Next, let P;, be any vertex such that P; —> P;, is true. Then we cannot have P;, —+ P;, as then 

P — P;, — P; would be a 2-step connection from ?; to P;. Thus, it must be that P; — P;,. That is, P; has 
1-step connections to all the vertices to which P; has 1-step connections. The vertex ; must then also have 
2-step connections to all the vertices to which ?; has 2-step connections. But because, in addition, we have 
that P; —+ P;, this means that P, has more 1-step and 2-step connections to other vertices than does ?}. 
However, this contradicts the way in which ; was chosen. Hence, there can be no vertex P; to which P; has 
no 1-step or 2-step connection. 


(c) 
Figure 10.6.12 


This proof shows that a vertex with the largest total number of 1-step and 2-step connections to other vertices 
has the property stated in the theorem. There is a simple way of finding such vertices using the vertex matrix 
M and its square jf 2. The sum of the entries in the ith row of M is the total number of 1-step connections 


from ?; to other vertices, and the sum of the entries of the ith row of Af 2 is the total number of 2-step 


connections from ?, to other vertices. Consequently, the sum of the entries of the ith row of the matrix 
A=M+M 2 is the total number of l-step and 2-step connections from ; to other vertices. In other words, 


arow of 4— M + A with the largest row sum identifies a vertex having the property stated in Theorem 
10.6.3. 


EXAMPLE 7 Using Theorem 10.6.3 <@ 


Suppose that five baseball teams play each other exactly once, and the results are as indicated in 
the dominance-directed graph of Figure 10.6.13. The vertex matrix of the graph is 


0011 0 

10101 

M=|00010 

010 0 0 

10110 

so 

00 117190 0101 0 0112 0 
1010 1 10235 2 2033 1 
A=M+M7=|0 001 0/+]0 100 0/=|0 1010 
0100 0 1010 1 ie aes 3 | 
1011 0 Oe T 2.0 ee eee 


The row sums of A are 
1 st row sum = 4 
2 ndrow sum = 9 
3rdrow sum = 2 
4 throw sum =4 
5 th row sum = 7 
Because the second row has the largest row sum, the vertex 3 must have a 1-step or 2-step 
connection to any other vertex. This is easily verified from Figure 10.6.13. 


P 


P; Py 


Figure 10.6.13 


We have informally suggested that a vertex with the largest number of 1-step and 2-step connections to other 
vertices is a “powerful” vertex. We can formalize this concept with the following definition. 


DEFINITION 3 


The power of a vertex of a dominance-directed graph is the total number of 1-step and 2-step 
connections from it to other vertices. Alternatively, the power of a vertex , is the sum of the entries 
of the ith row of the matrix 4— Af + \¢2, where M is the vertex matrix of the directed graph. 


EXAMPLE 8 Example7 Revisited 


Let us rank the five baseball teams in Example 7 according to their powers. From the 
calculations for the row sums in that example, we have 


Power of team?) =4 
Power of team P23 = 9 
Power of team P3 = 2 
Power of team P4=4 
Power of team P5 = 7 


Hence, the ranking of the teams according to their powers would be 
P> (first), Ps (second), P, and Pg (ted for third), P3 (last) 


Exercise Set 10.6 


1. Construct the vertex matrix for each of the directed graphs illustrated in Figure Ex-1. 


Py 


(a) (5) 


P, 
P, 


P, 
Ps 


(c) 


Figure Ex-1 


Answer: 


or oOo Oo 2 
ewe ew CO COCO Tr OG 
oro oOo rr oon r 
oor oOo Tr oc oOo Oo 
Orr CO DFO r Oo OD 


010100 
1000 0 0 
O to Lt a 
000001 
00000 1 
001010 


(c) 
2. Draw a diagram of the directed graph corresponding to each of the following vertex matrices. 


onrr OO 
coro Cc Orc O&O 
= oO Oo rf fF CO Oo = 
moO cCcocUmcmOmDUmUOCUOUU COU 
oronr oro ort 
S & 


OL o' 1 Oo 
1000 1 0 
000000 
1100 1 0 
000101 
010010 


(c) 


Answer: 


(a) © 


P, 


Ps 


P, P, 
P Py P. 
(c) | . 
Ps Ps P,; 


3. Let M be the following vertex matrix of a directed graph: 


(a) Draw a diagram of the directed graph. 


(b) Use Theorem 10.6.1 to find the number of 1-, 2-,and 3-step connections from the vertex ? to the 
vertex P. Verify your answer by listing the various connections as in Example 3. 


(c) Repeat part (b) for the 1-, 2-, and 3-step connections from P; to P4. 
Answer: 


(a) 


P, P; 
(b) om step: Pf} +P : 
2=—step: Py -~P4—- P32 
Pi 3 P39 P32 
3—step: Py ~P2—-P; + P32 
Pi 3 P33 Py Po 
Pi 3 P49 P33 P2 
(c) l—step: Py Pa 
2=—step: Pj -~P3—-P4 
3—step: Py ~ P23 Pi) Py 
Pi 3 P49 P34 PG 


4. (a) Compute the matrix product af? ag for the vertex matrix M in Example 1. 


(b) Verify that the Ath diagonal entry of a¢7 ag is the number of family members who influence the th 
family member. Why is this true? 


(c) Find a similar interpretation for the values of the nondiagonal entries of ag? yg. 


Answer: 

(a)/1 000 0 
0100 0 
00110 
0012 1 
00012 


(c) The,jth entry is the number of family members who influence both the ith and jth family members. 


5. By inspection, locate all cliques in each of the directed graphs illustrated in Figure Ex-S. 
P, 
, Ny, 
P LPN P P; 


(a) (b) 


P, P, P; 
P, Ps P 


(c) 


5 


Figure Ex-5 


Answer: 


(ay (Fi Fa.r 3) 
(b) (£3, Pa, P5} 
(c) (Po, P4, Pg, Pg} and (Pq, Ps, P6} 


6. For each of the following vertex matrices, use Theorem 10.6.2 to find all cliques in the corresponding 
directed graphs. 


(a) /}0 10 1 0 
1010 1 
01011 
1000 1 
10110 

(b)}0 1011 =°0 
103101121 
01010 1 
1031011 
01010 0 
00111 0 

Answer: 

(a) None 


(b) (23, P4, Pe} 


7. For the dominance-directed graph illustrated in Figure Ex-7 construct the vertex matrix and find the power 
of each vertex. 


P 
P, P 
Figure Ex-7 
Answer: 
00 1 1] Power of Pj =5 
10 0 O| Power of P3=3 
0 1 0 1| Power of P3=4 
0 1 0 O} Power of P4=2 
8. Five baseball teams play each other one time with the following results: 
A beats 8 CD 
B beats CL, # 
C beats D, # 
PD beats 3 
= beats A, D 


Rank the five baseball teams in accordance with the powers of the vertices they correspond to in the 
dominance-directed graph representing the outcomes of the games. 


Answer: 


First, A; second, B and E (tie); fourth, C; fifth, D 
Section 10.6 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the 
relevant documentation for the particular utility you are using. The goal of these exercises is to provide you 
with a basic proficiency with your technology utility. Once you have mastered the techniques in these 
exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 


T1. A graph having n vertices such that every vertex is connected to every other vertex has a vertex matrix 
given by 


01111... 1 
102742. 17 
[407 1.2 4 

Myn=!1 1101... 1 
11110..1 


tid Tv 4... 0 
In this problem we develop a formula for uk whose (2, 7 )-th entry equals the number of k-step connections 
from P; to P;. 
(a) Use a computer to compute the eight matrices Mr for x = 2, 3 and fork = 2, 3, 4, 5. 


(b) Use the results in part (a) and symmetry arguments to show that uk can be written as 


k 
Ot tr te Dl 
je oe es Ge ee | 
LT oD ace I 
Mp=|11101..1 
thet we a 
Laidd4 0 
a; 8, Bn Bn Bk Ii 
7, ao, Be Bn Be Bi 
7; Gy ao, Be Be Ti 
=|8, 8. 8. on Br Big 
7; By By By aK Fi 


(c) Using the fact that Me = M,, Fe show that 


fael= [3 a2llac| 


with 
(d) Using part (c), show that 


(e) Use the methods of Section 5.2 to compute 


Cait 
1 w—2 


and thereby obtain expressions for m;, and (3;,, and eventually show that 


Mr — — ia oe 1)"2, 


% 


where @/,, is the » x » matrix all of whose entries are ones and /,, is the »z x » identity matrix. 


(f) Show that for » = 2, all vertices for these directed graphs belong to cliques. 


T2. Consider a round-robin tournament among n players (labeled 41, 42, a3, .... @y) where @1 beats 42, a2 
beats 23, 43 beats ay, ..., @,—1 beats a,,, and @» beats 41. Compute the “power” of each player, showing that 
they all have the same power; then determine that common power. 

[Hint: Use a computer to study the cases x» = 3, 4, 5, 6; then make a conjecture and prove your conjecture to 
be true. | 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.7 Games of Strategy 


In this section we discuss a general game in which two competing players choose separate strategies to reach opposing 
objectives. The optimal strategy of each player is found in certain cases with the use of matrix techniques. 


Prerequisites 


Matrix Multiplication 
Basic Probability Concepts 


Game Theory 


To introduce the basic concepts in the theory of games, we will consider the following carnival-type game that two 
people agree to play. We will call the participants in the game player R and player C. Each player has a stationary wheel 
with a movable pointer on it as in Figure 10.7.1. For reasons that will become clear, we will call player R's wheel the 
row-wheel and player C's wheel the column-wheel. The row-wheel is divided into three sectors numbered 1, 2, and 3, 
and the column-wheel is divided into four sectors numbered 1, 2, 3, and 4. The fractions of the area occupied by the 
various sectors are indicated in the figure. To play the game, each player spins the pointer of his or her wheel and lets it 
come to rest at random. The number of the sector in which each pointer comes to rest is called the move of that player. 
Thus, player R has three possible moves and player C has four possible moves. Depending on the move each player 
makes, player C then makes a payment of money to player R according to Table 1. 


Row-wheel 
of player R 


Column-wheel 
of player C 


Figure 10.7.1 


Table 1 


Player C’s Move 


Player R’s 


For example, if the row-wheel pointer comes to rest in sector 1 (player R makes move 1), and the column-wheel pointer 
comes to rest in sector 2 (player C makes move 2), then player C must pay player R the sum of $5. Some of the entries 
in this table are negative, indicating that player C makes a negative payment to player R. By this we mean that player R 
makes a positive payment to player C. For example, if the row-wheel shows 2 and the column-wheel shows 4, then 
player R pays player C the sum of $4, because the corresponding entry in the table is —$4. In this way the positive entries 
of the table are the gains of player R and the losses of player C, and the negative entries are the gains of player C and the 
losses of player R. 


In this game the players have no control over their moves; each move is determined by chance. However, if each player 
can decide whether he or she wants to play, then each would want to know how much he or she can expect to win or lose 
over the long term if he or she chooses to play. (Later in the section we will discuss this question and also consider a 
more complicated situation in which the players can exercise some control over their moves by varying the sectors of 
their wheels.) 


Two-Person Zero-Sum Matrix Games 


The game described above is an example of a two-person zero-sum matrix game. The term zero-sum means that in each 
play of the game, the positive gain of one player is equal to the negative gain (loss) of the other player. That is, the sum 
of the two gains is zero. The term matrix game is used to describe a two-person game in which each player has only a 
finite number of moves, so that all possible outcomes of each play, and the corresponding gains of the players, can be 
displayed in tabular or matrix form, as in Table 1. 


In a general game of this type, let player R have m possible moves and let player C have n possible moves. In a play of 
the game, each player makes one of his or her possible moves, and then a payoffis made from player C to player R, 
depending on the moves. For? = 1, 2, ..., #, and j = 1, 2, ..., , let us set 
a;; = payoff that player C makes to player X if player X 
makes move i and player C makes move j 


This payoff need not be money; it can be any type of commodity to which we can attach a numerical value. As before, if 
an entry 4;j is negative, we mean that player C receives a payoff of |@;;| from player R. We arrange these mn possible 
payoffs in the form of an jz; x 4 matrix 


@11 @12 --- @ly 

a a ... @ 
a= 22 va 

2m1 @m2 --- @mn 


which we will call the payoff matrix of the game. 


Each player is to make his or her moves on a probabilistic basis. For example, for the game discussed in the 


introduction, the ratio of the area of a sector to the area of the wheel would be the probability that the player makes the 
move corresponding to that sector. Thus, from Figure 10.7.1, we see that player R would make move 2 with probability 


> and player C would make move 2 with probability ri In the general case we make the following definitions: 


Py = probability that player 2 makes move i (i= 1, 2, ..., #2) 
gj = probability that player C makes move j (j= 1, 2,..., 2) 
It follows from these definitions that 
Pitpats + +Pm_=! 
and 
Qitgats +s +gn=1 
With the probabilities P; and ¥j; we form two vectors: 
#1 
P=[P1 P2 --- Pm] and q= 3 
Gn 


We call the row vector p the strategy of player R and the column vector q the strategy of player C. For example, from 
Figure 10.7.1 we have 


for the carnival game described earlier. 


From the theory of probability, if the probability that player R makes move i is #3, and independently the probability that 
player C makes move j is ¥j, then ?i4; is the probability that for any one play of the game, player R makes move i and 
player C makes move j. The payoff to player R for such a pair of moves is #17. If we multiply each possible payoff by its 
corresponding probability and sum over all possible payoffs, we obtain the expression 


411P191 + @12P1¢2 + ---+ @1yP ign + 221 P2d1 + --- + ymP men (1) 


Equation | is a weighted average of the payoffs to player R; each payoff is weighted according to the probability of its 
occurrence. In the theory of probability, this weighted average is called the expected payoff to player R. It can be shown 
that if the game is played many times, the long-term average payoff per play to player R is given by this expression. We 
denote this expected payoff by #(p, q) to emphasize the fact that it depends on the strategies of the two players. From 
the definition of the payoff matrix A and the strategies p and q, it can be verified that we may express the expected 
payoff in matrix notation as 


@11  @12 --- B1y |] F1 
221 422 --- &2y || 92 

A(p,q)=([P1 p2 --- Pal] | - | =padq (2) 
am] m2 --- 2mm || Fn 


Because #(p, q) is the expected payoff to player R, it follows that —#(p, q) is the expected payoff to player C. 


EXAMPLE 1 Expected Payoffto PlayerR 


For the carnival game described earlier, we have 


1 
4 
111 ? 5-2 -1]/4 13 
2, a) =p4q=|4 3 5| —2 4 —3 —4 1 = 45 = 1805... 
6 —5 oO 3]l>s 
3 
pe 
6 


Thus, in the long run, player R can expect to receive an average of about 18 cents from player C in each 
play of the game. 


So far we have been discussing the situation in which each player has a predetermined strategy. We will now consider 
the more difficult situation in which both players can change their strategies independently. For example, in the game 
described in the introduction, we would allow both players to alter the areas of the sectors of their wheels and thereby 
control the probabilities of their respective moves. This qualitatively changes the nature of the problem and puts us 
firmly in the field of true game theory. It is understood that neither player knows what strategy the other will choose. It 
is also assumed that each player will make the best possible choice of strategy and that the other player knows this. 
Thus, player R attempts to choose a strategy p such that #'(p, q) is as large as possible for the best strategy q that player 
C can choose; and similarly, player C attempts to choose a strategy q such that #(p, q) is as small as possible for the 
best strategy p that player R can choose. To see that such choices are actually possible, we will need the following 
theorem, called the Fundamental Theorem of Two-Person Zero-Sum Games. (The general proof, which involves ideas 
from the theory of linear programming, will be omitted. However, below we will prove this theorem for what are called 
strictly determined games and 2? 5 2 matrix games.) 


THEOREM 10.7.1. Fundamental Theorem of Zero-Sum Games 


There exist strategies pb. and q” such that 


Zp. >Z(p .q)>2p.q) (3) 


for all strategies p and q. 


The strategies p- and q. in this theorem are the best possible strategies for players R and C, respectively. To see why 


this is so, let py = E(p’, q'). The left-hand inequality of Equation 3 then reads 
ZO’, q)>v for all strategies q 


This means that if player R chooses the strategy p . then no matter what strategy q player C chooses, the expected 


payoff to player R will never be below v. Moreover, it is not possible for player R to achieve an expected payoff greater 
than v. To see why, suppose there is some strategy p that player R can choose such that 


E(p.q)>v for all strategies q 


Then, in particular, 
ok * 
Ep .q)>v 
But this contradicts the right-hand inequality of Equation 3, which requires that py > #({p sa q'). Consequently, the best 


player R can do is prevent his or her expected payoff from falling below the value v. Similarly, the best player C can do 
is ensure that player R's expected payoff does not exceed v, and this can be achieved by using strategy q.. 


On the basis of this discussion, we arrive at the following definitions. 


DEFINITION 1 


If p- and q. are strategies such that 


Zp .@>Z(p .q)>2p.4q) (4) 


for all strategies p and q, then 
(i) p "is called an optimal strategy for player R. 
(ii) q. is called an optimal strategy for player C. 


iii) py = E(p , q’) is called the value of the game. 


The wording in this definition suggests that optimal strategies are not necessarily unique. This is indeed the case, and in 
Exercise 2 we ask you to show this. However, it can be proved that any two sets of optimal strategies always result in 
the same value v of the game. That is,ifp ,q andp ,q are optimal strategies, then 


* * aK OK 
Zp.q)=4@0 .q ) (5) 
The value of a game is thus the expected payoff to player R when both players choose any possible optimal strategies. 


To find optimal strategies, we must find vectors p * and q that satisfy Equation 4. This is generally done by using linear 


programming techniques. Next, we discuss special cases for which optimal strategies may be found by more elementary 
techniques. 


We now introduce the following definition. 


DEFINITION 2 


An entry @ys in a payoff matrix A is called a saddle point if 
(i) @ys is the smallest entry in its row, and 
(ii) 25 is the largest entry in its column. 


A game whose payoff matrix has a saddle point is called strictly determined. 


For example, the shaded element in each of the following payoff matrices is a saddle point: 


3 5 30-50 —5 16 -—8 —2 10 
ale 60 90 75), 
—4 —10 60 —30 7 10 6 2 
6 11 —3 2 


If a matrix has a saddle point @>s, it turns out that the following strategies are optimal strategies for the two players: 


0 
0 

PS G6: 1 ce: Ol a =| 1 | — sth entry 
rih entey ; 
0 


That is, an optimal strategy for player R is to always make the 7th move, and an optimal strategy for player C is to 
always make the sth move. Such strategies for which only one move is possible are called pure strategies. Strategies for 
which more than one move is possible are called mixed strategies. To show that the above pure strategies are optimal, 
you can verify the following three equations (see Exercise 6): 


E(p’.q.) =p Aq =ays (6) 
#ip*.q) =p*Aq>a,, for any strategy q (7) 
A(p. q*) =pAg*<a,, for any strategy P (8) 


Together, these three equations imply that 
Zip ,q) > 2p ,gq ) > Fp ) 


for all strategies p and q. Because this is exactly Equation 4, it follows that p and q. are optimal strategies. 


From Equation 6 the value of a strictly determined game is simply the numerical value of a saddle point +5. It is 
possible for a payoff matrix to have several saddle points, but then the uniqueness of the value of a game guarantees that 
the numerical values of all saddle points are the same. 


EXAMPLE 2 Optimal Strategies to Maximize a Viewing Audience 


Two competing television networks, R and C, are scheduling one-hour programs in the same time period. 
Network R can schedule one of three possible programs, and network C can schedule one of four possible 
programs. Neither network knows which program the other will schedule. Both networks ask the same 
outside polling agency to give them an estimate of how all possible pairings of the programs will divide the 
viewing audience. The agency gives them each Table 2, whose (i, /)-th entry is the percentage of the 
viewing audience that will watch network R if network R's program i is paired against network C's program 
j. What program should each network schedule in order to maximize its viewing audience? 


Table 2 


Network C’s 
Program 


pt fats |e | 
SOE 
Program 


Solution Subtract 50 from each entry in Table 2 to construct the following matrix: 
10 —30 —20 5 
0 2 —5 10 
20 =—5 =—15 —20 


This is the payoff matrix of the two-person zero-sum game in which each network is considered to start 
with 50% of the audience, and the (2, 7}-th entry of the matrix is the percentage of the viewing audience 
that network C loses to network R if programs i andj are paired against each other. It is easy to see that the 
entry 

ay3= —5 
is a saddle point of the payoff matrix. Hence, the optimal strategy of network R is to schedule program 2, 
and the optimal strategy of network C is to schedule program 3. This will result in network R's receiving 
45% of the audience and network C's receiving 55% of the audience. 


2 x 2 Matrix Games 


Another case in which the optimal strategies can be found by elementary means occurs when each player has only two 
possible moves. In this case, the payoff matrix is a 2 x% 2 matrix 


&11 &12 
A= 
& | 
If the game is strictly determined, at least one of the four entries of A is a saddle point, and the techniques discussed 


above can then be applied to determine optimal strategies for the two players. If the game is not strictly determined, we 
first compute the expected payoff for arbitrary strategies p and q: 


E(p,q) =p4éq=[P1 P2] B allaa| 


421 422 || ¢2 (9) 
= @11P1¢1 + @12P 192 + 4217241 + @22P24¢2 
Because 
Pitp2=1 and gitq2=1 (10) 


we may substitute pz = 1 — py and gz = 1 — 4, into 9 to obtain 


A(p, q) =@11 7191 + @127P101 — 41) +aa1(1 — py)g1 +an(1 — pi) — 41) (11) 


If we rearrange the terms in Equation 11, we can write 
A(p, a) = [C11 +422 — 212 — 21) P1 — (422 — 21) 91 + (412-422) P1 +422 (12) 


By examining the coefficient of the 71 term in 12, we see that if we set 


— 5, — — 4a 
Pl Pl “any Fan —a12— 471 (13) 
then that coefficient is zero, and 12 reduces to 
* _ 444473 — 417474 
Zp .Q= (14) 


11 + a32 —@12 — 221 


Equation 14 is independent of q; that is, if player R chooses the strategy determined by 13, player C cannot change the 
expected payoff by varying his or her strategy. 


In a similar manner, it can be verified that if player C chooses the strategy determined by 


a gp ae OD 
M1 1 ay Fan —ay2—a7 (15) 
then substituting in 12 gives 
* a1i2@ =— 2174 
E(p, q”) = —@11922 412421 _ 
(Pa) a1, + a32 —@12 — 221 (16) 
Equations 14 and 16 show that 
* * * * 
A(p ,g)=4(p .g )= 0,4 ) (17) 


for all strategies p and q. Thus, the strategies determined by 13, 15, and 10 are optimal strategies for players R and C, 
respectively, and so we have the following result. 


THEOREM 10.7.2 Optimal Strategies for a 2 x 2 Matrix Game 


For a 2 x% 2 game that is not strictly determined, optimal strategies for players R and C are 
* 433 — a2] 41, — 442 
om | aij +a727—-412-421 a1 +422 —- 412-42] 
and 
4332 — 412 
* | @11 +422 —-412- 41 
a@4j — 434 
a1] +473 —@12— 42] 
The value of the game is 


= 441433 — 41343] 
411 + @23 — 412 — a2} 


In order to be complete, we must show that the entries in the vectors p “and q. are numbers strictly between 0 and 1. In 


Exercise 8 we ask you to show that this is the case as long as the game is not strictly determined. 


Equation 17 is interesting in that it implies that either player can force the expected payoff to be the value of the game 
by choosing his or her optimal strategy, regardless of which strategy the other player chooses. This is not true, in 
general, for games in which either player has more than two moves. 


EXAMPLE 3 Using Theorem 10.7.2 


The federal government desires to inoculate its citizens against a certain flu virus. The virus has two 
strains, and the proportions in which the two strains occur in the virus population is not known. Two 
vaccines have been developed and each citizen is given only one of them. Vaccine | is 85% effective 
against strain 1 and 70% effective against strain 2. Vaccine 2 is 60% effective against strain 1 and 90% 
effective against strain 2. What inoculation policy should the government adopt? 


Solution We can consider this a two-person game in which player R (the government) desires to make 
the payoff (the fraction of citizens resistant to the virus) as large as possible, and player C (the virus) 
desires to make the payoff as small as possible. The payoff matrix is 


Strain 
12 
vaccine > [$0 30 
This matrix has no saddle points, so Theorem 10.7.2 is applicable. Consequently, 
— a73 — 434 = 30 — 60 230 2 
* ai; Fay—aj2—ay 854.90—.70—.60 45. 3 
a = l-pp=1-2=4 
a a7} = 443 as 30 — 70 _.20_4 
7 aij Fan—aj2—ay 854.90—.70—-.60 45. 9 
_ 411823 — 41702] _ (85)(.90) — €.70)(6.60) _ 345 _ 7666 
@11 +422 —@12— 4221 85 + .90 —.70 —.60 45 = 
2 1 


Thus, the optimal strategy for the government is to inoculate 3 of the citizens with vaccine | and 3 of the 


citizens with vaccine 2. This will guarantee that about 76.7% of the citizens will be resistant to a virus 
attack regardless of the distribution of the two strains. 


i 5 


In contrast, a virus distribution of 9 of strain 1 and ro of strain 2 will result in the same 76.7% of resistant 


citizens, regardless of the inoculation strategy adopted by the government (see Exercise 7). 


Exercise Set 10.7 


1. Suppose that a game has a payoff matrix 


(a) If players R and C use strategies 


fle Ble Ble Blo 


respectively, what is the expected payoff of the game? 


(b) If player C keeps his strategy fixed as in part (a), what strategy should player R choose to maximize his expected 
payoff? 


(c) If player R keeps her strategy fixed as in part (a), what strategy should player C choose to minimize the expected 
payoff to player R? 


Answer: 


(a) —5/8 
(b) [9 1 0] 
(Cy T1: 0:0 0)" 


2. Construct a simple example to show that optimal strategies are not necessarily unique. For example, find a payoff 
matrix with several equal saddle points. 


Answer: 


1 
1 


3. For the strictly determined games with the following payoff matrices, find optimal strategies for the two players, and 
find the values of the games. 


Let A= | for example. 


(a) }> 2 
7 3 
(b) | —3 =—2 
2 
—4 1 
()| 2 —2 0 
=—6 0 —5 
5 2 3 
(d) | =—3 2 —1 
—2 —1 5 
—4 1 0 
—3 4 6 
Answer: 


@ p*=[0 1], =|] y=3 


» 


nn 


. For the 2 sx 2 games with the following payoff matrices, find optimal strategies for the two players, and find the 
values of the games. 


(a) 6 3 
-1 4 

(b) | 40 20 
-10 30 


(a) 


LnlDo UnJi bs mith ale oa|~) cal 


ale 
Sol, 10] 41 8 — 29 
=| at te to 13 
13 


. Player R has two playing cards: a black ace and a red four. Player C also has two cards: a black two and a red three. 
Each player secretly selects one of his or her cards. If both selected cards are the same color, player C pays player R 
the sum of the face values in dollars. If the cards are different colors, player R pays player C the sum of the face 
values. What are optimal strategies for both players, and what is the value of the game? 


Answer: 


i 
[BB] -/2 | oa 
20 
6. Verify Equations 6, 7, and 8. 
7. Verify the statement in the last paragraph of Example 3. 
8. Show that the entries of the optimal strategies p and q. given in Theorem 10.7.2 are numbers strictly between zero 


and one. 
Section 10.7 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific 
calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for 
the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your 
technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology 
utility to solve many of the problems in the regular exercise sets. 


T1. Consider a game between two players where each player can make up to n different moves (# > 1). If the ith move 
of player R and the jth move of player C are such that i + j is even, then C pays R $1. If i + j is odd, then R pays C $1. 
Assume that both players have the same strategy—that is, py = [0] jy) 29d gy = [;] yj, Where 

01 + 92 + p3 +-...+ Py = 1. Use a computer to show that 


E(p2,42) = (e1-02)" 

E(p3,43) = (e1—p2 +3)" 

E(pa,q4) = (91 —02-+ 03-94)" 

E(ps.45) =(91-92 +03-p4+ 05)" 
Using these results as a guide, prove in general that the expected payoff to player R is 


n 2 
En, qn) = bx = via) >0 


which shows that in the long run, player R will not lose in this game. 

T2. Consider a game between two players where each player can make up to n different moves (# > 1). If both players 
make the same move, then player C pays player R $ (# — 1}. However, if both players make different moves, then 
player R pays player C $1. Assume that both players have the same strategy—that is, py = [pj] jx», aNd qu = [0;] yey. 
where pj + 92 + 93 + -..-++ Py, = 1. Use a computer to show that 


E(p2. q2) = 


#(p3, q3) = 


E(p4, a4) = 


$01 —p1)?+ $01 —p2)? + 5 (62 — 1)? 
+5(2— 92)" 

1 (oy — 01)? + 4001 = 92)? + 4001 - 3)” 

2 2 2 

+5 (62 =p)? + 5 (62 — 92)? + 5 (on — 3)" 
+5 (63 —p1)? + 5 (63 =p)? + 5 (63 — 3)" 
401 — 9)? + x —p2)?+ Sor = 93)" 
+5(01 — pa)? + 5 (02 —p1)? + 5 (02 — 92)? 
+5-(02 — 3)? + 5 (2-4)? + $-(93- 01)? 
+5 (03 —p2)? + 5 (63 —p3)7 + 5 (63 —pa)? 
+5 (04— 01)" - 5 (e4— 2)" 7 5 (e4— 03)" 


+5 (e4— pa)? 


Using these results as a guide, prove in general that the expected payoff to player R is 


ig¢ 2 
E(x, dn) = 2 > (p; — pj)” 29 


= y= 


which shows that in the long run, player R will not lose in this game. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.8 Leontief Economic Models 


In this section we discuss two linear models for economic systems. Some results about nonnegative matrices are applied to determine 
equilibrium price structures and outputs necessary to satisfy demand. 


Prerequisites 


Linear Systems 
Matrices 


Economic Systems 


Matrix theory has been very successful in describing the interrelations among prices, outputs, and demands in economic systems. In 
this section we discuss some simple models based on the ideas of Nobel laureate Wassily Leontief. We examine two different but 
related models: the closed or input-output model, and the open or production model. In each, we are given certain economic 
parameters that describe the interrelations between the “industries” in the economy under consideration. Using matrix theory, we then 
evaluate certain other parameters, such as prices or output levels, in order to satisfy a desired economic objective. We begin with the 
closed model. 


Leontief Closed (Input-Output) Model 
First we present a simple example; then we proceed to the general theory of the model. 
EXAMPLE 1 An Input-Output Model 


Three homeowners—a carpenter, an electrician, and a plumber—agree to make repairs in their three homes. They agree 
to work a total of 10 days each according to the following schedule: 


Work Performed by 


[career in | Pe 
Days of Work in Home of Electrician Poa fos [oa 
For tax purposes, they must report and pay each other a reasonable daily wage, even for the work each does on his or her 
own home. Their normal daily wages are about $100, but they agree to adjust their respective daily wages so that each 


homeowner will come out even—that is, so that the total amount paid out by each is the same as the total amount each 
receives. We can set 


Days of Work in Home of Plumber 


Py = daily wage of carpenter 
Pp? = daily wage of electrician 
p3 = daily wage of plumber 


To satisfy the “equilibrium” condition that each homeowner comes out even, we require that 
total expenditures = total income 


for each of the homeowners for the 10-day period. For example, the carpenter pays a total of 2p ++ p32 + 6p3 for the 


repairs in his own home and receives a total income of 107, for the repairs that he performs on all three homes. 
Equating these two expressions then gives the first of the following three equations: 


2Pp1 + pa + 6pz = 10p1 
4p, + S5p2 + p3 = 10p2 
4p; + 4p. + 3p3 = 10p3 


The remaining two equations are the equilibrium equations for the electrician and the plumber. Dividing these equations 
by 10 and rewriting them in matrix form yields 


he ae P2|=| P2 (1) 
4 4 13 || P3 P3 
Equation | can be rewritten as a homogeneous system by subtracting the left side from the right side to obtain 
8 =—1 =—6]) 71 0 
-4 5 =1}/ ?2/=]0 
-4 -4 7)| 23 0 


The solution of this homogeneous system is found to be (verify) 


Pl 31 
P2|=s| 32 
P3 36 


where s is an arbitrary constant. This constant is a scale factor, which the homeowners may choose for their 
convenience. For example, they can set g — 3 so that the corresponding daily wages—$93, $96, and $108—are about 
$100. 


This example illustrates the salient features of the Leontief input-output model of a closed economy. In the basic Equation 1, each 
column sum of the coefficient matrix is 1, corresponding to the fact that each of the homeowners' “output” of labor is completely 
distributed among these same homeowners in the proportions given by the entries in the column. Our problem is to determine suitable 
“prices” for these outputs so as to put the system in equilibrium—that is, so that each homeowner's total expenditures equal his or her 
total income. 


In the general model we have an economic system consisting of a finite number of “industries,” which we number as industries 

1, 2, ..., &. Over some fixed period of time, each industry produces an “output” of some good or service that is completely utilized in a 
predetermined manner by the x industries. An important problem is to find suitable “prices” to be charged for these k outputs so that 
for each industry, total expenditures equal total income. Such a price structure represents an equilibrium position for the economy. 


For the fixed time period in question, let us set 
Py = price charged by the ith industry for tts total output 
e;; = fraction of the total output of the jth industry purchased by the ith industry 


fori, j= 1, 2, ..., &. By definition, we have 


@) P; 29, p= 1/2, 00% 
(1) 23; = 9, i, j=1,2,...% 
(iit) @1j bea +... + ex = 1, PH 1, 2g 
With these quantities, we form the price vector 
Pi 
P2 
Pp = 
Pk 


and the exchange matrix or input-output matrix 


#11 @12 --- @1k 
221 @22 --- @2k 


fkl @k2 --- eh 


Condition (iii) expresses the fact that all the column sums of the exchange matrix are 1. 


As in the example, in order that the expenditures of each industry be equal to its income, the following matrix equation must be 
satisfied [see 1]: 


S 
I 
~ 


(2) 
or 

(/—#)p=0 (3) 
Equation 3 is a homogeneous linear system for the price vector p. It will have a nontrivial solution if and only if the determinant of its 
coefficient matrix 7 — # is zero. In Exercise 7 we ask you to show that this is the case for any exchange matrix E. Thus, 3 always has 
nontrivial solutions for the price vector p. 
Actually, for our economic model to make sense, we need more than just the fact that 3 has nontrivial solutions for p. We also need the 
prices 7; of the k outputs to be nonnegative numbers. We express this condition as p > 0. (In general, if A is any vector or matrix, the 
notation 4 > (0 means that every entry of A is nonnegative, and the notation 4 =. (} means that every entry of A is positive. Similarly, 


A>#8means 4— 82> 0, and 4 > 2 means 4 — § = 0.) To show that 3 has a nontrivial solution for which p > 0 is a bit more difficult 
than showing merely that some nontrivial solution exists. But it is true, and we state this fact without proof in the following theorem. 


THEOREM 10.8.1 


If E is an exchange matrix, then #p = p always has a nontrivial solution p whose entries are nonnegative. 


Let us consider a few simple examples of this theorem. 
EXAMPLE 2 Using Theorem 10.8.1 << 


Let 


Then (/ — #)p = 0 is 


which has the general solution 


where s is an arbitrary constant. We then have nontrivial solutions p > 0 for any g > 0. 


EXAMPLE 3 Using Theorem 10.8.1 


Let 


Then (/ — #)p = 0 has the general solution 


ef 


where s and ¢ are independent arbitrary constants. Nontrivial solutions p > 0 then result from any s > 0 and ¢ > 0, not 
both zero. 


Example 2 indicates that in some situations one of the prices must be zero in order to satisfy the equilibrium condition. Example 3 
indicates that there may be several linearly independent price structures available. Neither of these situations describes a truly 
interdependent economic structure. The following theorem gives sufficient conditions for both cases to be excluded. 


THEOREM 10.8.2 


Let E be an exchange matrix such that for some positive integer m all the entries of #™ are positive. Then there is exactly one 
linearly independent solution of (/ — #)p =0, and it may be chosen so that all its entries are positive. 


We will not give a proof of this theorem. If you have read Section 10.5 on Markov chains, observe that this theorem is essentially the 
same as Theorem 10.5.4. What we are calling exchange matrices in this section were called stochastic or Markov matrices in Section 
10.5. 


EXAMPLE 4 Using Theorem 10.8.2 


The exchange matrix in Example | was 


Because # ~ 0, the condition #™ = 0 in Theorem 10.8.2 is satisfied for »; — 1. Consequently, we are guaranteed that 
there is exactly one linearly independent solution of (/ — #)p = 0, and it can be chosen so that p > 0. In that example, 
we found that 
31 
p=| 32 
36 


is such a solution. 


Leontief Open (Production) Model 


In contrast with the closed model, in which the outputs of k industries are distributed only among themselves, the open model attempts 
to satisfy an outside demand for the outputs. Portions of these outputs can still be distributed among the industries themselves, to keep 
them operating, but there is to be some excess, some net production, with which to satisfy the outside demand. In the closed model the 
outputs of the industries are fixed, and our objective is to determine prices for these outputs so that the equilibrium condition, that 
expenditures equal incomes, is satisfied. In the open model it is the prices that are fixed, and our objective is to determine levels of the 
outputs of the industries needed to satisfy the outside demand. We will measure the levels of the outputs in terms of their economic 
values using the fixed prices. To be precise, over some fixed period of time, let 


x; = monetary value of the total output of the ith industry 
ad; = monetary value of the output of the ith industry needed to satisfy the outside demand 
cjj = monetary value of the output of the ith industry needed by the jth industry to produce one unit of monetary value of its own output 


With these quantities, we define the production vector 


x] 
x2 
x=| 
Xi 
the demand vector 
ay 
a 
a=|°? 
ai 
and the consumption matrix 
C11 C12 --- Chk 
C2 C22 --- CI 
Cit Ck Citk 
By their nature, we have that 
x>0, d>0, and Cc>o0 
From the definition of ©i7 and *j, it can be seen that the quantity 
Cpe eyaxZ +... eiKr 


is the value of the output of the ith industry needed by all k industries to produce a total output specified by the production vector x. 
Because this quantity is simply the ith entry of the column vector (x, we can say further that the ith entry of the column vector 
x-—Cx 
is the value of the excess output of the ith industry available to satisfy the outside demand. The value of the outside demand for the 
output of the ith industry is the ith entry of the demand vector d. Consequently, we are led to the following equation 
x-Cx=d 


or 
(i-C)x=d (4) 


for the demand to be exactly met, without any surpluses or shortages. Thus, given C and d, our objective is to find a production vector 
x > 0 that satisfies Equation 4. 


EXAMPLE 5 Production Vector fora Town <@ 


A town has three main industries: a coal-mining operation, an electric power-generating plant, and a local railroad. To 
mine $1 of coal, the mining operation must purchase $.25 of electricity to run its equipment and $.25 of transportation 
for its shipping needs. To produce $1 of electricity, the generating plant requires $.65 of coal for fuel, $.05 of its own 
electricity to run auxiliary equipment, and $.05 of transportation. To provide $1 of transportation, the railroad requires 
$.55 of coal for fuel and $.10 of electricity for its auxiliary equipment. In a certain week the coal-mining operation 
receives orders for $50,000 of coal from outside the town, and the generating plant receives orders for $25,000 of 
electricity from outside. There is no outside demand for the local railroad. How much must each of the three industries 
produce in that week to exactly satisfy their own demand and the outside demand? 


Solution For the one-week period let 
x1, = value of total output of coal-mining operation 
x2 = value of total output of power-generating plant 
x3 = value of total output of local railroad 


From the information supplied, the consumption matrix of the system is 


0 65 .55 
C=]|.25 .05 .10 
25 05 0 


The linear system (7 — C")x = d is then 
100 —65 =—55]fx1] | 59, 000 
=25 95 =—10})%2] =} 25, 000 
=—.25 =—.05 1.00 || *3 0 
The coefficient matrix on the left is invertible, and the solution is given by 
756 542 470]| 50,000] | 102, 087 
x=(I-0)d=ae 220 690 190|]25,000|=| 56, 163 
200 170 630 0 28, 330 


Thus, the total output of the coal-mining operation should be $102,087, the total output of the power-generating plant 
should be $56,163, and the total output of the railroad should be $28,330. 


Let us reconsider Equation 4: 
(i-C)x=d 


If the square matrix 7 — ct is invertible, we can write 
x=(2—C)d (5) 


In addition, if the matrix (7 = C’) —l has only nonnegative entries, then we are guaranteed that for any d > 0), Equation 5 has a unique 


nonnegative solution for x. This is a particularly desirable situation, as it means that any outside demand can be met. The terminology 
used to describe this case is given in the following definition. 


DEFINITION 1 


A consumption matrix C is said to be productive if (7 — C’) — exists and 


(i-C)*>0 


We will now consider some simple criteria that guarantee that a consumption matrix is productive. The first is given in the following 
theorem. 


THEOREM 10.8.3 Productive Consumption Matrix 


A consumption matrix C is productive if and only if there is some production vector x > 0 such that x ~ Ctx. 


(The proof is outlined in Exercise 9.) The condition x = (x means that there is some production schedule possible such that each 
industry produces more than it consumes. 


Theorem 10.8.3 has two interesting corollaries. Suppose that all the row sums of C are less than 1. If 


1 
1 


x= 


then (x is a column vector whose entries are these row sums. Therefore, x = (x, and the condition of Theorem 10.8.3 is satisfied. 
Thus, we atrive at the following corollary: 


COROLLARY 10.8.4 


A consumption matrix is productive if each of its row sums is less than 1. 


As we ask you to show in Exercise 8, this corollary leads to the following: 


COROLLARY 10.8.5 


A consumption matrix is productive if each of its column sums is less than 1. 


Recalling the definition of the entries of the consumption matrix C, we see that the jth column sum of C is the total value of the outputs 
of all & industries needed to produce one unit of value of output of the jth industry. The jth industry is thus said to be profitable if that 
jth column sum is less than 1. In other words, Corollary 10.8.5 says that a consumption matrix is productive if all k industries in the 
economic system are profitable. 


EXAMPLE 6 Using Corollary 10.8.5 <4 


The consumption matrix in Example 5 was 


0 65 55 
C=].25 .05 .10 
25 05 0 


All three column sums in this matrix are less than 1, so all three industries are profitable. Consequently, by Corollary 
10.8.5, the consumption matrix C is productive. This can also be seen in the calculations in Example 5, as (/ — C’) lis 


nonnegative. 


Exercise Set 10.8 


1. For the following exchange matrices, find nonnegative price vectors that satisfy the equilibrium condition 3. 


(ay }1 1 
2 3 
12 
23 

(b) | 1 1 
2° 2 
ee 
a 
1 
7 10 

(c) [.35 50 .30 
25.20 .30 
40 .30 40 


Answer: 


(a) 
(b) | 


(c) | 78 


. Using Theorem 10.8.3 and its corollaries, show that each of the following consumption matrices is productive. 


(a) }.8 1 
3 6 


(b) | .70 .30 .25 


.20 40 .25 

05 15 .25 
(c) |.? 3 .2 

1 4 «3 

2 4 11 
Answer: 


(a) Use Corollary 10.8.4; all row sums are less than one. 

(b) Use Corollary 10.8.5; all column sums are less than one. 

(c) 2 19 

Use Theorem 10.8.3, withx=]1]/=>Cx=| 9]. 
1 3 


. Using Theorem 10.8.2, show that there is only one linearly independent price vector for the closed economic system with exchange 
matrix 


An thy 
SoU 


Answer: 


B? has all positive entries. 


. Three neighbors have backyard vegetable gardens. Neighbor 4 grows tomatoes, neighbor B grows corn, and neighbor C grows 
lettuce. They agree to divide their crops among themselves as follows: A gets 5 of the tomatoes, 3 of the corn, and ri of the 


lettuce. B gets 3 of the tomatoes, + of the corn, and ri of the lettuce. C gets z of the tomatoes, 7 of the corn, 5 of the lettuce. 


What prices should the neighbors assign to their respective crops if the equilibrium condition of a closed economy is to be satisfied, 
and if the lowest-priced crop is to have a price of $100? 


Answer: 


Price of tomatoes, $120.00; price of corn, $100.00; price of lettuce, $106.67 


. Three engineers—a civil engineer (CE), an electrical engineer (EE), and a mechanical engineer (ME)—each have a consulting firm. 
The consulting they do is of a multidisciplinary nature, so they buy a portion of each others' services. For each $1 of consulting the 
CE does, she buys $.10 of the EE's services and $.30 of the ME's services. For each $1 of consulting the EE does, she buys $.20 of 
the CE's services and $.40 of the ME's services. And for each $1 of consulting the ME does, she buys $.30 of the CE's services and 
$.40 of the EE's services. In a certain week the CE receives outside consulting orders of $500, the EE receives outside consulting 
orders of $700, and the ME receives outside consulting orders of $600. What dollar amount of consulting does each engineer 
perform in that week? 


Answer: 


$1256 for the CE, $1448 for the EE, $1556 for the ME 


6. (a) Suppose that the demand @; for the output of the ith industry increases by one unit. Explain why the ith column of the matrix 
(i=C) —1 is the increase that must be made to the production vector x to satisfy this additional demand. 


(b) Referring to Example 5, use the result in part (a) to determine the increase in the value of the output of the coal-mining 
operation needed to satisfy a demand of one additional unit in the value of the output of the power-generating plant. 


Answer: 
(b) 242 
503 


7. Using the fact that the column sums of an exchange matrix £ are all 1, show that the column sums of 7 — # are zero. From this, 
show that 7 — # has zero determinant, and so (7 — #)p = 0 has nontrivial solutions for p. 


8. Show that Corollary 10.8.5 follows from Corollary 10.8.4. 
[Hint: Use the fact that (A i = =(A = . for any invertible matrix A.] 


9. (Calculus required) Prove Theorem 10.8.3 as follows: 


(a) Prove the “only if” part of the theorem; that is, show that if C is a productive consumption matrix, then there is a vector x > 0 
such that x = Cx. 


(b) Prove the “if” part of the theorem as follows: 
Step 1 Show that if there is a vector x" > 0 such that Cy” =x", then x* ~ 9. 
Step 2 Show that there is a number A such that 0 =< \ < |] and Cy” = jy”. 
Step 3 Show that ("x = \"x"* for = 1, 2,.... 
Step 4 Show that C” —. 0 as » — oo. 
Step 5 By multiplying out, show that 


=C)i+C+07 +...407')=f=0" 
for = 1, 2,.... 
Step 6 By letting » —, ao in Step 5, show that the matrix infinite sum 
S=1+C4+C7 +... 


exists and that (7 — C)S=/. 
Step 7 Show that §> 0 and that $= (7—c)71. 


Step 8 Show that C is a productive consumption matrix. 


Section 10.8 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, 
Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of 
these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in 
these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. 


T1. Consider a sequence of exchange matrices {#3, #3, #4, #5, ....2y,}, where 


oe iL: 
4 033 
2 1 
ee F3=|1 0 1], 
ee 2 
2 joist 
3. 3 
Pas Yano is lo 
Oe aes 2s 49'S 
23 4 ro 111 
11 3 4 5 
bee ig na 1 oes | 
fe=| a it B=[92° 4 5 
eae 1 1 
oo to 1 
i 4 3 5 
O76: bt 
3 4 ooo 11 
4 5 


and so on. Use a computer to show that es > 03, Hi > 03 Ei > O04 RR > Qs, and make the conjecture that although En > Oy is true, 
Bk > 0, is not true fork = 1, 2, 3, ..., 3 — 1. Next, use a computer to determine the vectors P» such that 2p, = py (for » = 2, 3, 4, 
5, 6), and then see if you can discover a pattern that would allow you to compute Py+1 easily from Py. Test your discovery by first 
constructing Pg from 
2520 
3360 
1890 
py=| 672 
175 
36 
7 


and then checking to see whether gps = ps. 
T2. Consider an open production model having n industries with » = |. In order to produce $1 of its own output, the jth industry must 
spend $(1 / #) for the output of the ith industry (for all i # ,/), but the jth industry (for all ; = 1, 2, 3, ..., 2) spends nothing for its own 
output. Construct the consumption matrix C’,,, show that it is productive, and determine an expression for (7,, — C',) in 
determining an expression for (J,, — C',) 1 use a computer to study the cases when » = 2, 3, 4, and 5; then make a conjecture and 
prove your conjecture to be true. [Hint: If Fy, = [1] px» (ie., the » x » matrix with every entry equal to 1), first show that 

BP Fe =n, 


and then express your value of (7,, — C’,) —! in terms of n, /,, and F,,.] 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.9 Forest Management 


In this section we discuss a matrix model for the management of a forest where trees are grouped into classes according to height. 
The optimal sustainable yield of a periodic harvest is calculated when the trees of different height classes can have different 


economic values. 


Prerequisites 


Matrix Operations 


Optimal Sustainable Yield 


Our objective is to introduce a simplified model for the sustainable harvesting of a forest whose trees are classified by height. The 
height of a tree is assumed to determine its economic value when it is cut down and sold. Initially, there is a distribution of trees 
of various heights. The forest is then allowed to grow for a certain period of time, after which some of the trees of various heights 
are harvested. The trees left unharvested are to be of the same height configuration as the original forest, so that the harvest is 
sustainable. As we will see, there are many such sustainable harvesting procedures. We want to find one for which the total 
economic value of all the trees removed is as large as possible. This determines the optimal sustainable yield of the forest and is 
the largest yield that can be attained continually without depleting the forest. 


The Model 


Suppose that a harvester has a forest of Douglas fir trees that are to be sold as Christmas trees year after year. Every December 
the harvester cuts down some of the trees to be sold. For each tree cut down, a seedling is planted in its place. In this way the total 
number of trees in the forest is always the same. (In this simplified model, we will not take into account trees that die between 
harvests. We assume that every seedling planted survives and grows until it is harvested.) 


In the marketplace, trees of different heights have different economic values. Suppose that there are n different price classes 
corresponding to certain height intervals, as shown in Table 1 and Figure 10.9.1.The first class consists of seedlings with heights 
in the interval [0, #1), and these seedlings are of no economic value. The nth class consists of trees with heights greater than or 


equal to #y,_4. 


Height of Tree 


Value of Tree 


Figure 10.9.1 


Table 1 


Value (dollars) 


Let x; G@ = 1, 2, ..., 2) be the number of trees within the ith class that remain after each harvest. We form a column vector with 
the numbers and call it the nonharvest vector: 

x1 

x2 


For a sustainable harvesting policy, the forest is to be returned after each harvest to the fixed configuration given by the 
nonharvest vector x. Part of our problem is to find those nonharvest vectors x for which sustainable harvesting is possible. 


Because the total number of trees in the forest is fixed, we can set 

Xyerxg+e °° +X%y,=s (1) 
where s is predetermined by the amount of land available and the amount of space each tree requires. Referring to Figure 10.9.2, 
we have the following situation. The forest configuration is given by the vector x after each harvest. Between harvests the trees 


grow and produce a new forest configuration before each harvest. A certain number of trees are removed from each class at the 
harvest. Finally, a seedling is planted in place of each tree removed, to return the forest again to the configuration x. 


5 Trees 
Lf removed 
Fa 
ra 
A 
_—_K 1 LN 
/ \ ae . 
{ V — /  \ 
A Al 3 
: E 
= Forest afler growth Trees not removed 3 
8 z 
Oo Same =| 
forest = 
L_ configuration Al 
Forest before growth Forest after harvest 
{nonharvest vector x) {nonharvest vector x) 
Figure 10.9.2 


Consider first the growth of the forest between harvests. During this period a tree in the ith class may grow and move up to a 


higher height class. Or its growth may be retarded for some reason, and it will remain in the same class. We consequently define 
the following growth parameters 8; fori = 1, 2,....2— 1: 


2; = the fraction of trees in the ith class that grow into the (@ + 1)-st class during a growth period 


For simplicity we assume that a tree can move at most one height class upward in one growth period. With this assumption, we 
have 


1 = g; =the fraction of trees in the ith class that remain in the ith class during a growth period 


With these », — ] growth parameters, we form the following » x » growth matrix: 


l-g 0 0 cre 0 
gi i-gz 90 Ae, ol 
oe a eee (2) 
0 0 0 sss L—gy-1 0 
0 0 0 mone 2n-1 1 
Because the entries of the vector x are the numbers of trees in the n classes before the growth period, you can verify that the 
entries of the vector 
(l=gi)x1 
g1x1 + (1 —g2)x2 
Cx = g2x2 + (1 = g3)x3 G) 
8n—2%n—-2 + (1 — By-1)%n-1 
8n-1%yn-1 + %y 
are the numbers of trees in the 7 classes after the growth period. 
Suppose that during the harvest we remove y; (? = 1, 2, ..., 2) trees from the ith class. We will call the column vector 
¥1 
¥2 
y=|". 
Yn 
the harvest vector. Thus, a total of 
Yityats ss +yn 
trees are removed at each harvest. This is also the total number of trees added to the first class (the new seedlings) after each 
harvest. If we define the following » x » replacement matrix 
1. Pete. 
nn (4) 
0 0 0 
then the column vector 
vityat FY¥n 
0 
Ry= 0 (5) 
0 


specifies the configuration of trees planted after each harvest. 


At this point we are ready to write the following equation, which characterizes a sustainable harvesting policy: 


ti 
configuration jee eeoding comes on 
atend of | — [harvest] 4 = | at beginning of 
: replacement ; 
growth period growth period 
or mathematically, 
Gx-y+Ry =x 
This equation can be rewritten as 
U-Rjy =(G-/)x (6) 
or more comprehensively as 
Ot at eee ot aa)f m1 cai ee omy 
0 ea o|| »2 4? a O}| x2 
0.60) (1 O Of}; #3 |_ 0 g2 —g3 a3 
0 60 (U0 1 O}} ¥»-1 0 0 0 =—g,-1 0} *"-1 
0 0 0 1 Yn 0 0 0 ra xy 


We will refer to Equation 6 as the sustainable harvesting condition. Any vectors x and y with nonnegative entries, and such that 

X{X2+ + * * + X»y=S, which satisfy this matrix equation, determine a sustainable harvesting policy for the forest. Note that 
if y; > 0, then the harvester is removing seedlings of no economic value and replacing them with new seedlings. Because there is 
no point in doing this, we assume that 


y1=0 (7) 


With this assumption, it can be verified that 6 is the matrix form of the following set of equations: 


Y2FVZF OCT Hn = B1%1 
Y2 = 81%*1—82%2 
Y3 = 82X27 — 83%3 
(8) 
Yn-1 = Sn—-2%n-2 — Sn-1%n-1 
Yn = &n-1%*n-1 
Note that the first equation in 8 is the sum of the remaining », — | equations. 
Because we must have y, > 0 fori = 2, 3, .... , Equations 8 require that 
B1X1 = 89xX2S *° * * SBy-1Xy-1 = 9 (9) 


Conversely, if x is a column vector with nonnegative entries that satisfy Equation 9, then 7 and 8 define a column vector y with 
nonnegative entries. Furthermore, x and y then satisfy the sustainable harvesting condition 6. In other words, a necessary and 
sufficient condition for a nonnegative column vector x to determine a forest configuration that is capable of sustainable 
harvesting is that its entries satisfy 9. 


Optimal Sustainable Yield 


Because we remove y; trees from the ith class (i = 2, 3, ..., 2) and each tree in the ith class has an economic value of 7, the 
total yield of the harvest, Yid, is given by 


Vid = p2y2 + p3yv3+.--+ Pun (10) 


Using 8, we may substitute for the y,'s in 10 to obtain 


Vid = p2gix1 + (p3— p2)g2x2 +... + (Pn Pn-1)8n-1%n-1 (11) 


Combining 11, 1, and 9, we can now state the problem of maximizing the yield of the forest over all possible sustainable 
harvesting policies as follows: 


Problem 


Find nonnegative numbers x1, X3, ..., X, that maximize 
Yid = p2gix1 + (p3— p2)g2%2 +--+ n= Pn-1)8n—14n-1 
subject to 
Xp #AQ+..-FXy,=S 
and 


B1X1 2 82X72 >... 2 Bn-1%Xn-1 20 


As formulated above, this problem belongs to the field of linear programming. However, we will illustrate the following result, 
without linear programming theory, by actually exhibiting a sustainable harvesting policy. 


THEOREM 10.9.1 Optimal Sustainable Yield 


The optimal sustainable yield is achieved by harvesting all the trees from one particular height class and none of the trees 
from any other height class. 


Let us first set 
Vid; = yield obtamned by harvesting all of the Ath class and none of the other classes 


The largest value of ¥id;, for k = 2, 3, ..., 2 will then be the optimal sustainable yield, and the corresponding value of k will be 
the class that should be completely harvested to attain the optimal sustainable yield. Because no class but the Ath is harvested, we 
have 

¥2=¥3=---=Ye-1 =e =--- =n = (12) 
In addition, because all of the Ath class is harvested, no trees are ever present in the height classes above the Ath class. Thus, 


Xe = X41 =--- =H Xp, = (13) 


Substituting 12 and 13 into the sustainable harvesting condition 8 gives 


Ye = B1%1 

0 = g1x1—g2%2 

0 = g2x2—g83%3 (14) 
O = Be-2%k-2— Sk-1Xk-1 

Yi = Sh-1%k-1 


Equations 14 can also be written as 


Vie = 8141 = 82%2 =---= Bk-1%Xk-1 


from which it follows that 


x2 = g1x1/g2 
*3 = g1%1/ 83 
Xe-1 = -B1%1! Bet 
If we substitute Equations 13 and 16 into 
X1,FAQ+..-FXy,=S 
[which is Equation 1], we can solve for x1 and obtain 
_ s 
a erere sores Zi 
14 | -...4 
82 §3 &k-1 


For the yield ¥idj;,, we combine 10, 12, 15, and 17 to obtain 
Vid = p2y2+ pwy3t..-+ Prn 


= Puke 
= PREIX1 
_ Ps 
1 1 1 
— += +... 
Zi 82 Sk-1 


(15) 


(16) 


(17) 


(18) 


Equation 18 determines ¥id;, in terms of the known growth and economic parameters for any « = 2, 3, ..., 2. Thus, the optimal 


sustainable yield is found as follows. 


THEOREM 10.9.2 Finding the Optimal Sustainable Yield 


The optimal sustainable yield is the largest value of 
PRE 
1 1 1 
— + — -...- 
81 82 &k-1 
for k = 2, 3, ..., x. The corresponding value of k is the number of the class that is completely harvested. 


In Exercise 4 we ask you to show that the nonharvest vector x for the optimal sustainable yield is 


ligy 
ligg 
x = ——____ 5s | tS et 
tty 4+ | 0 
Zi 82 Zk-1 
0 
0 


Theorem 10.9.2 implies that it is not necessarily the highest-priced class of trees that should be totally cropped. The growth 


parameters 2; must also be taken into account to determine the optimal sustainable yield. 


EXAMPLE 1 Using Theorem 10.9.2 


(19) 


For a Scots pine forest in Scotland with a growth period of six years, the following growth matrix was found (see 
M. B. Usher, “A Matrix Approach to the Management of Renewable Resources, with Special Reference to Selection 
Forests,” Journal of Applied Ecology, vol. 3, 1966, pp. 355-367): 


72 0 0 0 DO 9D 

28 69 0 0 DO 20 
G= 0 31 75 0 O90 90 
0 25 77 O 0 

0 

0 


i) 
0 O QO .23 .63 
0 0 O OO .37 1.00 


Suppose that the prices of trees in the five tallest height classes are 
p2= $50, pz = $100, pa=$150, ps= $200, p6= $250 
Which class should be completely harvested to obtain the optimal sustainable yield, and what is that yield? 


Solution From matrix G we have that 
g1 =.28, g2=.31, g3=.25, g4=.23, g5=.37 
Equation 18 then gives 
Yidg = 50s/(.287')=14.0s 
Yid3 = 100s/(.287'4+ .317)=14.7s 
Yidg = 150s/(. 2874 .31714 . 2571) = 13.95 
Yids = 200cf(.287'+ 3174 2574 . 2371) = 13.25 
Yidg = 250s/(.28 74 .3174 2574 23744 377) =140s 


We see that Fd is the largest of these five quantities, so from Theorem 10.9.2 the third class should be completely 
harvested every six years to maximize the sustainable yield. The corresponding optimal sustainable yield is $14.7s, 
where s is the total number of trees in the forest. 


Exercise Set 10.9 


1. A certain forest is divided into three height classes and has a growth matrix between harvests given by 


Bf 

x 0 0 
=|2 2 
G=|5 7 0 

2 

$i 


If the price of trees in the second class is $30 and the price of trees in the third class is $50, which class should be completely 
harvested to attain the optimal sustainable yield? What is the optimal yield if there are 1000 trees in the forest? 


Answer: 


The second class; $15,000 


2. In Example 1, to what level must the price of trees in the fifth class rise so that the fifth class is the one to harvest completely 
in order to attain the optimal sustainable yield? 


Answer: 


$223 
3. In Example 1, what must the ratio of the prices p3: p3: p4: p5: pg be in order that the yields Wid;,, k = 2, 3, 4, 5, 6, all be the 


same? (In this case, any sustainable harvesting policy will produce the same optimal sustainable yield. 


Answer: 


1:1.90:3.02:4.24:5.00 
4. Derive Equation 19 for the nonharvest vector x corresponding to the optimal sustainable harvesting policy described in 
Theorem 10.9.2. 


5. For the optimal sustainable harvesting policy described in Theorem 10.9.2, how many trees are removed from the forest 
during each harvest? 


Answer: 


-l -l -l 
si (gyi tay) + - + - +9544) 


6. If all the growth parameters g1, 23, ..., Z,—1 in the growth matrix G are equal, what should the ratio of the prices 
P2. P3-.-. Py be in order that any sustainable harvesting policy be an optimal sustainable harvesting policy? (See Exercise 3.) 


Answer: 


1:2:3::++:n—1 


Section 10.9 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, 
Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some 
linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are 
using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have 
mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the 
regular exercise sets. 


T1. A particular forest has growth parameters given by 


for? = 1, 2, 3, ..., 2 — 1, where v (the total number of height classes) can be chosen as large as needed. Suppose that the value of 
a tree in the Ath height interval is given by 
PR=a(k— 1)" 
where a is a constant (in dollars) and p is a parameter satisfying 1 < p< 2. 
(a) Show that the yield Fd; is given by 
awl 
Wes 2atk— 1)" “s 
k 
(b) For 
p= 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9 
use a computer to determine the class number that should be completely harvested, and determine the optimal sustainable 
yield in each case. Make sure that you allow k to take on only integer values in your calculations. 


(c) Repeat the calculations in part (b) using 
p= 1.91, 1.92, 1.93, 1.94, 1.95, 


1.96, 1.97, 1.98, 1.99 


(d) Show that if » = 2, then the optimal sustainable yield can never be larger than 2as. 
(e) Compare the values of k determined in parts (b) and (c) to 1 / (2 — p), and use some calculus to explain why 
1 
ke 
2—p 


T2. A particular forest has growth parameters given by 


af 

2 

for? = 1, 2, 3, ...,%— 1, where n (the total number of height classes) can be chosen as large as needed. Suppose that the value of 
a tree in the Ath height interval is given by 


gi= 


PR=alk—1)" 
where a is a constant (in dollars) and p is a parameter satisfying 1 < p. 
(a) Show that the yield Fd; is given by 
— Pe 
Yidy = a 1)*s 
22 
(b) For 
p= 1, 2, 3,4, 5, 6,7, 8, 9, 10 
use a computer to determine the class number that should be completely harvested in order to obtain an optimal yield, and 
determine the optimal sustainable yield in each case. Make sure that you allow x to take on only integer values in your 
calculations. 


(c) Compare the values of & determined in part (b) to 1 + o / In(2) and use some calculus to explain why 


a eee =e 
kKel4 in(D) 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.10 Computer Graphics 


In this section we assume that a view of a three-dimensional object is displayed on a video screen and show 
how matrix algebra can be used to obtain new views of the object by rotation, translation, and scaling. 


Prerequisites 


Matrix Algebra 
Analytic Geometry 


Visualization of a Three-Dimensional Object 


Suppose that we want to visualize a three-dimensional object by displaying various views of it on a video 
screen. The object we have in mind to display is to be determined by a finite number of straight line segments. 
As an example, consider the truncated right pyramid with hexagonal base illustrated in Figure 10.10.1. We first 
introduce an xyz-coordinate system in which to embed the object. As in Figure 10.10.1, we orient the coordinate 
system so that its origin is at the center of the video screen and the xy-plane coincides with the plane of the 
screen. Consequently, an observer will see only the projection of the view of the three-dimensional object onto 
the two-dimensional xy-plane. 


Figure 10.10.1 


In the xyz-coordinate system, the endpoints P;, /3, ..., P,, of the straight line segments that determine the view 
of the object will have certain coordinates—say, 
(%1,.91,21), (49,99: 27), <<5 (Xn, Yn Zn) 


These coordinates, together with a specification of which pairs are to be connected by straight line segments, 


are to be stored in the memory of the video display system. For example, assume that the 12 vertices of the 
truncated pyramid in Figure 10.10.1 have the following coordinates (the screen is 4 units wide by 3 units high): 


P1:(1.000, —.800, .000), P>:(.500, —.800, —.866), 
P3:(—.500, —.800, —.866), P4:(— 1.000, —.800, 000), 
Ps:(—.500, —.800, 866), P¢:(.500, —.800, 866), 
P>:(.840, —.400, 000), Pe:(.315, 125, —.546), 
Po:(—.210, 650, —.364), Pig: ( —.360, .800, .000), 
P11:( —.210, .650, 364), P19:(.315, 125, 546) 


These 12 vertices are connected pairwise by 18 straight line segments as follows, where P; +» P j denotes that 
point P; is connected to point 3: 

Pie Po, Poae Ps, Pre Py, Pye Ps, P50 P65, Page Fi, 

Pie Ps, Pee Py, Powe Pig, Pipe Pi, Pipe Piz, P1247, 

Pie Ps, Paes, Pree Po, Pee Pig, Pse Py, Pee P12 
In View 1 these 18 straight line segments are shown as they would appear on the video screen. It should be 
noticed that only the x- and y-coordinates of the vertices are needed by the video display system to draw the 


view, because only the projection of the object onto the xy-plane is displayed. However, we must keep track of 
the z-coordinates to carry out certain transformations discussed later. 


We now show how to form new views of the object by scaling, translating, or rotating the initial view. We first 
construct a 3 x matrix P, referred to as the coordinate matrix of the view, whose columns are the coordinates 
of the 1 points of a view: 


Xj XQ ... X»y 
P=|¥1 ¥2 --- Yn 
Zi 22... Zy 


For example, the coordinate matrix P corresponding to View | is the 3 x 12 matrix 


1.000 500 =—500 =—1.000 —.500 500 840 315 =—210 =—.360 =—.210 .315 
=—800 =—800 =—800 —800 =—800 —800 —.400 ae 650 800 650 .125 
000 =—866 —.866 000 866 866 O00 =—546 —.364 000 364 546 


We will show below how to transform the coordinate matrix P of a view to a new coordinate matrix P' 
corresponding to a new view of the object. The straight line segments connecting the various points move with 
the points as they are transformed. In this way, each view is uniquely determined by its coordinate matrix once 
we have specified which pairs of points in the original view are to be connected by straight lines. 


Scaling 


The first type of transformation we consider consists of scaling a view along the x, y, and z directions by factors 
of a, B, and y, respectively. By this we mean that if a point P; has coordinates (x,, y;, z;) in the original view, it 
is to move to a new point ee with coordinates (@x,, Jy;, yz;) 1n the new view. This has the effect of 
transforming a unit cube in the original view to a rectangular parallelepiped of dimensions a x {7 x y (Figure 


10.10.2). Mathematically, this may be accomplished with matrix multiplication as follows. Define a 3 x 3 
diagonal matrix 


Then, if a point , in the original view is represented by the column vector 
x 7 
Ji 
Zj 


then the transformed point FE is represented by the column vector 


x1 [a 0 Olrx 
J; — 0 8 0 Vi 
zi 00 ¥)| 4 


Using the coordinate matrix P, which contains the coordinates of all 1 points of the original view as its columns, 
we can transform these n points simultaneously to produce the coordinate matrix P' of the scaled view, as 
follows: 


a 0 Olfx, x2 Xy 
SP = |0 # O||y1 »2 Yn 
00 ¥y|| 41 22 Zy 
QX{ OX2 ... AXy 
= |81 By2 ... Byn|=P! 
"271 22 --- Yen 


The new coordinate matrix can then be entered into the video display system to produce the new view of the 
object. As an example, View 2 is View | scaled by setting q@ = 1,8, @ = 0.5, and y = 3.0. Note that the scaling 
*y = 3.0 along the z-axis is not visible in View 2, since we see only the projection of the object onto the 
xy-plane. 


(b) 


Figure 10.10.2 


Ne 


2 1 0 1 


View 2 View 1 scaled by a= 1.8, = 0.5, y= 3.0 


Translation 
We next consider the transformation of translating or displacing an object to a new position on the screen. 


Referring to Figure 10.10.3, suppose we desire to change an existing view so that each point P; with 
coordinates (x,;, y,, z;) Moves to a new point hae with coordinates (x; + x9, yy + ¥g, Z; +Zg)- The vector 


is called the translation vector of the transformation. By defining a 3 s¢ », matrix 7 as 


Xo Xo --- XQ 
T=|¥0 Yo --- Yo 
Zo Zo .-- 29 


we can translate all m points of the view determined by the coordinate matrix P by matrix addition via the 
equation 
P'=P+T 

The coordinate matrix P' then specifies the new coordinates of the n points. For example, if we wish to 
translate View 1 according to the translation vector 

1.2 

0.4 

1.7 


the result is View 3. Note, again, that the translation zg = 1.7 along the z-axis does not show up explicitly in 
View 3. 


View 3 View | translated by xo = 1.2, ygo=— 04, 2g5= 1.7. 


>" 4 a 
PAX; + Xp Yj + Yor Z + 2) 


PAX;, Yj, 2) 


Figure 10.10.3 


In Exercise 7, a technique of performing translations by matrix multiplication rather than by matrix addition is 
explained. 


Rotation 


A more complicated type of transformation is a rotation of a view about one of the three coordinate axes. We 
begin with a rotation about the z-axis (the axis perpendicular to the screen) through an angle 0. Given a point P; 
in the original view with coordinates (x,, y,, z;), We wish to compute the new coordinates iz : a : - ) of the 


rotated point td . Referring to Figure 10.10.4 and using a little trigonometry, you should be able to derive the 
following: 

x; =pcos(o+ 8) =pcosécosd@—psngsm#=x; cosP—y; smn 

y, =psin(o+f) =pcos@snf+psndcos#=x; sn@+ y; cosé 

J =z 
These equations can be written in matrix form as 

7 

2 cos# =—sn@ 0 || %; 
yy =|sn@ cos# O}/ Hi 
z! 0 0 1 || 4 


If we let R denote the 3 x 3 matrix in this equation, all n points can be rotated by the matrix product P’ = RP to 
yield the coordinate matrix P’ of the rotated view. 


PAX}. Yin zi) 


Figure 10.10.4 


Rotations about the x- and y-axes can be accomplished analogously, and the resulting rotation matrices are 
given with Views 4, 5, and 6. These three new views of the truncated pyramid correspond to rotations of View 1 
about the x-, y-, and z-axes, respectively, each through an angle of 90°. 


Rotation about the x-axis 


0 cos@ —sin@ 
0 sind cosé 


View 4 View 1 rotated 90° about the x-axis 


Rotation about the y-axis 


ds. 


cos? 0 sind 
0 if 0 
sin O cosé 


cos@ -sing 0 
snf@ coséd 0 
0 tt] 1 


View 6 View | rotated 90° about the z-axis. 


Rotations about three coordinate axes may be combined to give oblique views of an object. For example, View 
7 is View | rotated first about the x-axis through 30°, then about the y-axis through —7Q°, and finally about the 
z-axis through —27°. Mathematically, these three successive rotations can be embodied in the single 
transformation equation P' — RP. where R is the product of three individual rotation matrices: 
1 0 0 
Ri = |9 cos(30°) —sin(30°) 
0 sin(30°) cos(30°) 


cos(— 70°) O sin( — 70°) 

Rk = 0 1 0 
—sinf— 70°) O cos(— 70°) 
cos(—27°) =—smn(—27°) 0 
R3 = | sin(—27°) cos(—27°) 0 
0 0 1 

in the order 

305 =—.025 =—.952 
R=R3Rk2R} =| —.155 .985 —.076 
340 171.296 


View 7 Oblique view of truncated pyramid. 


As a final illustration, in View 8 we have two separate views of the truncated pyramid, which constitute a 
stereoscopic pair. They were produced by first rotating View 7 about the y-axis through an angle of —3° and 
translating it to the right, then rotating the same View 7 about the y-axis through an angle of 43° and 


translating it to the left. The translation distances were chosen so that the stereoscopic views are about 25 


inches apart—the approximate distance between a pair of eyes. 


View 8 Stereoscopic figure of truncated pyramid. The three-dimensionality of the diagram can be seen 
by holding the book about one foot away and focusing on a distant object. Then by shifting your 
gaze to View 8 without refocusing, you can make the two views of the stereoscopic pair merge 
together and produce the desired effect. 


Exercise Set 10.10 


1. View 9 is a view of a square with vertices (0, 0, 0), (1, 0, 0), (1, 1, 0), and (0, 1, 0). 
(a) What is the coordinate matrix of View 9? 


1 


(b) What is the coordinate matrix of View 9 after it is scaled by a factor 15 in the x-direction and > in the 
y-direction? Draw a sketch of the scaled view. 
(c) What is the coordinate matrix of View 9 after it is translated by the following vector? 
—2 
=] 
2 


Draw a sketch of the translated view. 


(d) What is the coordinate matrix of View 9 after it is rotated through an angle of —3(0)° about the z-axis? 
Draw a sketch of the rotated view. 


hy -I 0 1 2 


0 


Ex-View 9 Square with vertices (0, 0, 0), (1, 0, 0), (1, 1, 0), and (0, 1, 0) (Exercises 1 and 2) 


Answer 
ay fO-d 10 
0011 
0000 
(b) a 2 
0230 
dk 
oo i] 
0000 
Fp uy a oe 
al «1 0 0 
. 2 2 3 


(d) 
0 866 1.366 .500 


0 =—500 .366 866 
0 0 0 0 


2. (a) Ifthe coordinate matrix of View 9 is multiplied by the matrix 


i 
ba 
010 
00 1 


the result is the coordinate matrix of View 10. Such a transformation is called a shear in the x-direction 


with factor ; with respect to the y-coordinate. Show that under such a transformation, a point with 
coordinates (x,, y,, z;) has new coordinates (x; ++ i Yi. Z;)- 


(b) What are the coordinates of the four vertices of the shear square in View 10? 


(c) The matrix 


1 0 
ol 
0 0 


- © 2 


determines a shear in the y-direction with factor .6 with respect to the x-coordinate (an example appears 
in View 11). Sketch a view of the square in View 9 after such a shearing transformation, and find the 
new coordinates of its four vertices. 


Ex-View 10 View 9 sheared along the x-axis by > with respect to the y-coordinate (Exercise 2) 


Ex-View 11 View | sheared along the y-axis by .6 with respect to the x-coordinate (Exercise 2). 


Answer: 


(b) 


(0,0, 0), (1,0, 0), (13. 1.0} and (3-10) 


(c) (0,9, 0), (1, 6,0), (1, 1.6, 0), (0, 1, 0) 


3. (a) The reflection about the xz-plane is defined as the transformation that takes a point (x;, y;, z;) to the 
point (x;, —y;, z;) (€-g., View 12). If P and P' are the coordinate matrices of a view and its reflection 
about the xz-plane, respectively, find a matrix M such that P’ = MP. 


(b) Analogous to part (a), define the reflection about the yz-plane and construct the corresponding 
transformation matrix. Draw a sketch of View 1 reflected about the yz-plane. 


(c) Analogous to part (a), define the reflection about the xy-plane and construct the corresponding 
transformation matrix. Draw a sketch of View 1 reflected about the xy-plane. 


Ex-View 12 View 1 reflected about the xz-plane (Exercise 3). 


Answer: 

(a) | 1 
0 —1 
0 0 

b 

(b) —1 0 0 
01 0 
00 1 

Oro 9 
0 1 0 
00 —!1 


4. (a) View 13 is View 1 subject to the following five transformations: 


1. Scale by a factor of 5 in the x-direction, 2 in the y-direction, and ; in the z-direction. 


1 


2. Translate > unit in the x-direction. 


3. Rotate 29° about the x-axis. 
4. Rotate —45° about the y-axis. 
5. Rotate 99° about the z-axis. 
Construct the five matrices M41, £3, Mf3, M4, and Ms associated with these five transformations. 


(b) If P is the coordinate matrix of View 1 and P’ is the coordinate matrix of View 13, express P" in terms 


of M41, Mo, Mz, M4, Ms, and P. 


Ex-View 13 View 1 scaled, translated, and rotated (Exercise 4) 


Answer: 
(a) 1 0 0 i 4 1 1 0 0 
; 22 2 : : 
M,=|0 2 OO}, Mo= 0 0 of M3z=/0 cos20 —sin20 |, 
0 0 ; 0 0 0 0 sin20 cos 20° 


cos(—45) 0 sin(—45°) 0 
M4= 0 1 0 , Ms=(|1 
—sin(—45) 0 cos(—45) 0 


(b) P! = M;M4M3(M\P + M3) 


Ss. (a) View 14 is View 1 subject to the following seven transformations: 


. Scale by a factor of .3 in the x-direction and by a factor of .5 in the y-direction. 
. Rotate 45° about the x-axis. 

. Translate 1 unit in the x-direction. 

. Rotate 35° about the y-axis. 

. Rotate —45° about the z-axis. 


Nn nA BW NY 


. Translate 1 unit in the z-direction. 
7. Scale by a factor of 2 in the x-direction. 
Construct the matrices Af;, Mf3, ..., 4f7 associated with these seven transformations. 


(b) If P is the coordinate matrix of View 1 and ?’ is the coordinate matrix of View 14, express P' in terms 
of 441, M9, ..., M43, and P. 


Ex-View 14 View 1 scaled, translated, and rotated (Exercise 5). 


Answer: 
(a) 300 1 90 0 i 7 1 
Mi=|0 5 O|, Mo=|0 cos45 —sindsS |, Mz=]0 0 - 0], 
00 1 0 sin45 cos45" 0 0 0 
cos35 0 a 35 cos(—45) —sin(—45) 0 
Ma=) 0 1 O |, Ms=) sin (45°) cos(—45°) 0} 
=—sin35 OQ cos 35 0 0 1 
0 0 0 200 
Ms=|0 0 --+ O|, Mz=/0 1 0 
{4 1 i oa 


(b) P! = My(MsM4(M2M1P + M3) + Mg) 


6. Suppose that a view with coordinate matrix P is to be rotated through an angle 0 about an axis through the 
origin and specified by two angles a and f (see Figure Ex-6). If P’ is the coordinate matrix of the rotated 
view, find rotation matrices Ry, 23, 23, Ry, and Rs such that 


P! = RsRgR3R2R|P 
[Hint: The desired rotation can be accomplished in the following five steps: 
1. Rotate through an angle of B about the y-axis. 
2. Rotate through an angle of o about the z-axis. 
3. Rotate through an angle of @ about the y-axis. 
4. Rotate through an angle of —a about the z-axis. 
5. Rotate through an angle of —B about the y-axis.] 


Figure Ex-6 


Answer: 


cos 8 0 sin 2 cosa —sina O 


R= 0 1 O |, 22=]sma cosa 0], 
=—snG 0 cos@ 0 0 1 
cos# O sinf cosa sina O 

R=] 0 1 O |, k4=|]—sina cosa 0], 
=—snf 0 cosé 0 0 1 


cosG 0 =—sn 
Rs=| 0 1 0 
snG 0 cos@ 
7. This exercise illustrates a technique for translating a point with coordinates (x,, y,, z;) to a point with 
coordinates (x, + xg, ¥; +g, Z; + Zg) by matrix multiplication rather than matrix addition. 


(a) Let the point (x,;, y,, z;) be associated with the column vector 


and let the point (x, ++ xg, yj + 9, Z; + Zg) be associated with the column vector 
xy XQ 
Yityvo 
Zj +29 
1 


Find a 4 x 4 matrix M such that v, = My;. 


(b) Find the specific 4 x 4 matrix of the above form that will effect the translation of the point (4, — 2, 3) 
to the point (= 1, 7, 0). 


Answer: 
(a) 10 0 xg 
M= 0 1 0 yo 
00 1 zp 
000 1 
(b)}1 0 0 =—5 
01 0 9 
001 —3 
000 1 


8. For the three rotation matrices given with Views 4, 5, and 6, show that 
R -l =, T 


(A matrix with this property is called an orthogonal matrix. See Section 7.1.) 


Section 10.10 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a 
basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you 
will be able to use your technology utility to solve many of the problems in the regular exercise sets. 


T1. Let (a, 6, c) be a unit vector normal to the plane gx + by + cz = 0, and let ry = (x, y, z) be a vector. It 
can be shown that the mirror image of the vector r through the above plane has coordinates 


rn = oe eo where 


Xm x 
Ym |= MY 
Zm Zz 
with 
100 a 
M=i-2m’=|0 1 0|-2/a|[a 2} c] 
001 c 


(a) Show that yf? — j and give a physical reason why this must be so. [Hint: Use the fact that (a, 6, ¢) isa 


unit vector to show that pn? y = 1.] 


(b) Use a computer to show that det( Af) = = 1. 

(c) The eigenvectors of satisfy the equation 
Xm x x 
Ym|= MY | =A Y 
Zm 2 Zz 


and therefore correspond to those vectors whose direction is not affected by a reflection through the plane. 
Use a computer to determine the eigenvectors and eigenvalues of M, and then give a physical argument to 
support your answer. 


T2. A vector y = (x, y, z) is rotated by an angle 0 about an axis having unit vector (a, 4, c), thereby forming 


the rotated vector yp = (x p, yp, Zp)- It can be shown that 


*R x 
YRI=ROY 
ZR z 
with 
10 0 a 
RP) =cos(#|0 1 O]+(1—cos(#))| S][a 4 ce] 
00 1 c 
0 —c &b 


+sin(#}} ¢ O =a 


(a) Use a computer to show that R(#)R(o) = R(O + w), and then give a physical reason why this must be so. 
Depending on the sophistication of the computer you are using, you may have to experiment using different 


values of a, b, and 
C0 beg ab" 


(b) Show also that Ro (@) = R{ —@) and give a physical reason why this must be so. 
(c) Use a computer to show that det(R(@)) = + 1. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.11 Equilibrium Temperature Distributions 


In this section we will see that the equilibrium temperature distribution within a trapezoidal plate can be found 
when the temperatures around the edges of the plate are specified. The problem is reduced to solving a system of 


linear equations. Also, an iterative technique for solving the problem and a “random walk” approach to the 
problem are described. 


Prerequisites 


Linear Systems 
Matrices 


Intuitive Understanding of Limits 


Boundary Data 


Suppose that the two faces of the thin trapezoidal plate shown in Figure 10.11.1a are insulated from heat. Suppose 
that we are also given the temperature along the four edges of the plate. For example, let the temperature be 
constant on each edge with values of 0°, 0°, 1°, and 2°, as in the figure. After a period of time, the temperature 
inside the plate will stabilize. Our objective in this section is to determine this equilibrium temperature distribution 
at the points inside the plate. As we will see, the interior equilibrium temperature is completely determined by the 
boundary data—that is, the temperature along the edges of the plate. 


Temperature = 2° 


0.00 


Temperature = |‘ 
a 2 aca 


(a) (b) 
Figure 10.11.1 
The equilibrium temperature distribution can be visualized by the use of curves that connect points of equal 


temperature. Such curves are called isotherms of the temperature distribution. In Figure 10.11.15 we have 
sketched a few isotherms, using information we derive later in the chapter. 


Although all our calculations will be for the trapezoidal plate illustrated, our techniques generalize easily to a plate 
of any practical shape. They also generalize to the problem of finding the temperature within a three-dimensional 
body. In fact, our “plate” could be the cross section of some solid object if the flow of heat perpendicular to the 
cross section is negligible. For example, Figure 10.11.1 could represent the cross section of a long dam. The dam is 
exposed to three different temperatures: the temperature of the ground at its base, the temperature of the water on 
one side, and the temperature of the air on the other side. A knowledge of the temperature distribution inside the 
dam is necessary to determine the thermal stresses to which it is subjected. 


Next we will consider a certain thermodynamic principle that characterizes the temperature distribution we are 


seeking. 


The Mean-Value Property 


There are many different ways to obtain a mathematical model for our problem. The approach we use is based on 
the following property of equilibrium temperature distributions. 


THEOREM 10.11.1. The Mean-Value Property 


Let a plate be in thermal equilibrium and let P be a point inside the plate. Then if C is any circle with 
center at P that is completely contained in the plate, the temperature at P is the average value of the 
temperature on the circle (Figure 10.11.2). 


(+r)c 


Figure 10.11.2 


This property is a consequence of certain basic laws of molecular motion, and we will not attempt to derive it. 

Basically, this property states that in equilibrium, thermal energy tends to distribute itself as evenly as possible 
consistent with the boundary conditions. It can be shown that the mean-value property uniquely determines the 
equilibrium temperature distribution of a plate. 


Unfortunately, determining the equilibrium temperature distribution from the mean-value property is not an easy 
matter. However, if we restrict ourselves to finding the temperature only at a finite set of points within the plate, 
the problem can be reduced to solving a linear system. We pursue this idea next. 


Discrete Formulation of the Problem 


We can overlay our trapezoidal plate with a succession of finer and finer square nets or meshes (Figure 10.11.3). In 
(a) we have a rather coarse net; in (b) we have a net with half the spacing as in (a); and in (c) we have a net with 
the spacing again reduced by half. The points of intersection of the net lines are called mesh points. We classify 
them as boundary mesh points if they fall on the boundary of the plate or as interior mesh points if they lie in the 
interior of the plate. For the three net spacings we have chosen, there are 1, 9, and 49 interior mesh points, 
respectively. 


= 
jt 


mn 1 interior mesh point (5) 9 interior mesh points (c) 49 interior mesh points 


yew nwyeNv wv NY NY Ww WY Ww Ww 


Figure 10.11.3 


In the discrete formulation of our problem, we try to find the temperature only at the interior mesh points of some 
particular net. For a rather fine net, as in (c), this will provide an excellent picture of the temperature distribution 
throughout the entire plate. 


At the boundary mesh points, the temperature is given by the boundary data. (In Figure 10.11.3 we have labeled all 
the boundary mesh points with their corresponding temperatures.) At the interior mesh points, we will apply the 
following discrete version of the mean-value property. 


THEOREM 10.11.2 Discrete Mean-Value Property 


At each interior mesh point, the temperature is approximately the average of the temperatures at the four 
neighboring mesh points. 


This discrete version is a reasonable approximation to the true mean-value property. But because it is only an 
approximation, it will provide only an approximation to the true temperatures at the interior mesh points. However, 
the approximations will get better as the mesh spacing decreases. In fact, as the mesh spacing approaches zero, the 
approximations approach the exact temperature distribution, a fact proved in advanced courses in numerical 
analysis. We will illustrate this convergence by computing the approximate temperatures at the mesh points for the 
three mesh spacings given in Figure 10.11.3. 


Case (a) of Figure 10.11.3 is simple, for there is only one interior mesh point. If we let fg be the temperature at this 


mesh point, the discrete mean-value property immediately gives 
to=4(2+ 1+0+0)=0.75 


In case (5) we can label the temperatures at the nine interior mesh points £1, £3, ..., £9, as in Figure 10.11.35. (The 
particular ordering is not important.) By applying the discrete mean-value property successively to each of these 
nine mesh points, we obtain the following nine equations: 


ty =F( 4240460) 

ty = Gti +34 t4+2) 

t= 4(ta +t5+0+0) 

ty= Gta tts +t7+2) 

oo 43 +t4+t6 +12) (1) 
to= Ges + to +0 +0) 

ty= F(t fed id) 

tg=Flts+t7+t9+1) 


io = 5 (te +ig+1+0) 


This is a system of nine linear equations in nine unknowns. We can rewrite it in matrix form as 


t= Mt+b (2) 
where 
0+o0000000 
4 1 
1 pee 5 = 
,oR E000 OD 
i 1 1 > 
és oF 0050000 
£3 1 1 1 
0 to0o01 5010 0 i 
‘i 4 4 4 ; 
t=| ts M=|0 0730704 0 b=| 0 
i 000020001 ; 
£7 4 4 3 
tg 00020004 0 4 
‘9 1 1 1 4 
A 4 
00000704 0 


To solve Equation 2, we write it as 


(i= M)t=b 


The solution for t is thus 
t=(7—M)"b (3) 


as long as the matrix (/ — A#) is invertible. This is indeed the case, and the solution for t as calculated by 3 is 


0.7846 
1.1383 
0.4719 
1.2967 
t=| 0.7491 (4) 
0.3265 
1.2995 
0.9014 
0.5570 


Figure 10.11.4 is a diagram of the plate with the nine interior mesh points labeled with their temperatures as given 
by this solution. 


1.2967 —-0.7491 —-0,3265 


1.2995 —-0,9014-—-0,5370 


Figure 10.11.4 


For case (c) of Figure 10.11.3, we repeat this same procedure. We label the temperatures at the 49 interior mesh 
points as £1, £3, ..., £49 in some manner. For example, we may begin at the top of the plate and proceed from left to 
right along each row of mesh points. Applying the discrete mean-value property to each mesh point gives a system 
of 49 linear equations in 49 unknowns: 


i= gta +2+0+0) 


{7 = zt Fiz +f4+ 2) 
(5) 
1 
tag = g (eal + £47 + £49 + 1) 
ig = 5 (tan + tag + 0+ 1) 
In matrix form, Equations 5 are 
t= Mt+hb 
where t and b are column vectors with 49 entries, and M is a 49 x 49 matrix. As in 3, the solution for t is 
t=(-M)—"b (6) 


In Figure 10.11.5 we display the temperatures at the 49 mesh points found by Equation 6. The nine unshaded 
temperatures in this figure fall on the mesh points of Figure 10.11.4. 


™m 


L611 04915 


1.3625 —- 0,8048 — 0.3528 


1.4844 — 1.0122 — 0.6064 — 0.2710 


1.5627 — 1.1533 0.7896 — 0.4778 — 0.2162 


1.6131 — 1.2488 — 0.9210 — 0.6342 — 0.3868 — 0.1756 


™ 


Nw 


Nw 


1.6409 1.3078 — 1.0114 — 0.7513 — 0.5214 — 0.3157 — 0.1344 


1.6426 — 1.3301 — 1.0657 — 0.8380 — 0.6318 — 0.4312 — 0.2221 


1.5994 — 1.3042 — 1.0834 — 0,9032 — 0,7365 — 0.5554 — 0,3227 


1.4508 — 1.2039 — 1.0605 — 0.9548 — 0.8556 — 0.7311 — 0.5135 


~ 


0 


0 


™~ 


Nw 


Figure 10.11.5 


In Table 1 we compare the temperatures at these nine common mesh points for the three different mesh spacings 
used. 


Table 1 


Temperatures at Common 
Mesh Points 


0.7846 | 0.8048 
1.1383 1.1533 


OA7TLO | 0.4778 
1.2967 1.3078 
0.7491 | 0.7513 
0.3265 | 0.3157 
1.2995 1.3042 
0.9014 | 0.9032 
0.5570 | 0.5554 


Knowing that the temperatures of the discrete problem approach the exact temperatures as the mesh spacing 
decreases, we may surmise that the nine temperatures obtained in case (c) are closer to the exact values than those 
in case (b). 


A Numerical Technique 


To obtain the 49 temperatures in case (c) of Figure 10.11.3, it was necessary to solve a linear system with 49 
unknowns. A finer net might involve a linear system with hundreds or even thousands of unknowns. Exact 
algorithms for the solutions of such large systems are impractical, and for this reason we now discuss a numerical 
technique for the practical solution of these systems. 

To describe this technique, we look again at Equation 2: 


t= Mt+b (7) 


The vector t we are seeking appears on both sides of this equation. We consider a way of generating better and 
better approximations to the vector solution t. For the initial approximation ¢©) we can take ¢@) — 9 if no better 


choice is available. If we substitute ¢ into the right side of 7 and label the resulting left side as ¢“, we have 
oD = MO | b (8) 
If we substitute ¢{) into the right side of 7, we generate another approximation, which we label ¢@): 


—- MeO +b (9) 


Continuing in this way, we generate a sequence of approximations as follows: 


© = m+n 
FP = MO+b 


= MO+d 
@ = meOin 


One would hope that this sequence of approximations tO, tO), {2 


(10) 


, ... converges to the exact solution of 7. We do 


not have the space here to go into the theoretical considerations necessary to show this. Suffice it to say that for the 
particular problem we are considering, the sequence converges to the exact solution for any mesh size and for any 


initial approximation ¢©). 


This technique of generating successive approximations to the solution of 7 is a variation of a technique called 


Jacobi iteration; the approximations themselves are called iterates. As a numerical example, let us apply Jacobi 


iteration to the calculation of the nine mesh point temperatures of case (b). Setting ¢{@ — 9, we have, from 


Equation 2, 


= vO + b= MO+b=b= 


Oo flr 
oS 
So 
Oo 
Oo 
o 
o 
Oo 


Oo flr 


oS 
So 
oS 
o 


Oo flr 
Oo flr 


oO 
oO 
oS Bl 
oS Bl 
oO Oo flr 
Oo fl 2 So 


fs) 


Some additional iterates are 


5000 
5000 
0000 
5000 
0000 
0000 
7500 
2500 
2500 


5000 
5000 
0000 
5000 
0000 
0000 
7500 
2500 
2500 


5000 
5000 
0000 
5000 


000 


0000 
7500 
2500 
2500 


6250 
7500 
1250 
6125 
1875 
0625 
Sat5 
5000 
3125 


0.6875 0.7791 0.7845 0.7846 


0.8906 1.1230 1.1380 1.1383 
0.2344 0.4573 0.4716 0.4719 
0.9688 1.2770 1.2963 1.2967 

t® =| 0.3750 |, 12% — | 0.7236 |, t2) — | 0.7486 |, t©9 — | 0.7491 
0.1250 0.3131 0.3263 0.3265 
1.0781 1.2848 1.2992 1.2995 
0.6094 0.8827 0.9010 0.9014 
0.3906 0.5446 0.5567 0.5570 


All iterates beginning with the thirtieth are equal to ¢S to four decimal places. Consequently, ¢@% is the exact 


solution to four decimal places. This agrees with our previous result given in Equation 4. 


The Jacobi iteration scheme applied to the linear system 5 with 49 unknowns produces iterates that begin repeating 
to four decimal places after 119 iterations. Thus, ¢2!% would provide the 49 temperatures of case (c) correct to 
four decimal places. 


A Monte Carlo Technique 


In this section we describe a so-called Monte Carlo technique for computing the temperature at a single interior 
mesh point of the discrete problem without having to compute the temperatures at the remaining interior mesh 
points. First we define a discrete random walk along the net. By this we mean a directed path along the net lines 
(Figure 10.11.6) that joins a succession of mesh points such that the direction of departure from each mesh point is 
chosen at random. Each of the four possible directions of departure from each mesh point along the path is to be 
equally probable. 


Figure 10.11.6 


By the use of random walks, we can compute the temperature at a specified interior mesh point on the basis of the 
following property. 


THEOREM 10.11.3. Random Walk Property 


Let 1, #3, .... Wy, be a succession of random walks, all of which begin at a specified interior mesh point. 
Let ti , t eide th be the temperatures at the boundary mesh points first encountered along each of these 


random walks. Then the average value (t; } t bo th ) / » of these boundary temperatures approaches 


the temperature at the specified interior mesh point as the number of random walks n increases without 
bound. 


This property is a consequence of the discrete mean-value property that the mesh point temperatures satisfy. The 
proof of the random walk property involves elementary concepts from probability theory, and we will not give it 
here. 


In Table 2 we display the results of a large number of computer-generated random walks for the evaluation of the 
temperature ¢5 of the nine-point mesh of case (4) in Figure 10.11.6. The first column lists the number n of the 
random walk. The second column lists the temperature tn of the boundary point first encountered along the 
corresponding random walk. The last column contains the cumulative average of the boundary temperatures 
encountered along the n random walks. Thus, after 1000 random walks we have the approximation ¢5 ~ .7550. 
This compares with the exact value ¢5 = .7491 that we had previously evaluated. As can be seen, the convergence 
to the exact value is not too rapid. 


Table 2 


Tawra] [Teen ome 
” 


1.0000 0.9500 
1.5000 : é 0.8000 
1.3333 é' 0.8250 
1.0000 - 0.8400 


1.2000 uf : 0.8300 
1.0000 : 0.8000 
1.1429 - 0.8050 
1.0000 250 0.8240 
L111 SC 0.7860 
1.0000 0.7550 


Exercise Set 10.11 


1. A plate in the form of a circular disk has boundary temperatures of 9° on the left of its circumference and 1% on 
the right half of its circumference. A net with four interior mesh points is overlaid on the disk (see Figure 
Ex-1). 


(a) Using the discrete mean-value property, write the 4 x 4 linear system t = Aft + b that determines the 
approximate temperatures at the four interior mesh points. 


(b) Solve the linear system in part (a). 
(c) Use the Jacobi iteration scheme with ¢@ — g to generate the iterates reo {2 tO, t, and t© for the 


linear system in part (a). What is the “error vector” ¢©) — 4, where t is the solution found in part (b)? 


(d) By certain advanced methods, it can be determined that the exact temperatures to four decimal places at the 
four mesh points are £; = £3 = .2871 and £3 =£4 = .7129. What are the percentage errors in the values 


found in part (b)? 
Figure Ex-1 
Answer: 
(a) pee 
P : 4 4 : ; 0 
1 1 
iy. @ a 1 
£2 4 4 || £2 2 
= + 
£3 1 9 9 1/4 0 
fg 4 4 f4 us 
0 dal 0 2 
4 4 
(b) i 
4 
el 
4 
t= 
EA 
4 
Es 
4 
(c) a 3 a I> Jt 
0 8 32 64 64 
1 Pl uk. 23 47 = il 
(1y__| 2 @_|8 @) _ | 16 (4 _ | 32 (5) _ | 64 (5) = 64 
C= 1 ty = t= = =—t= 
0 | 1} 3.| ce ees) aan We 
1 8 16 32 64 64 
2 3 i 23 47 _t1 
8 16 32 64 64 


(d) for ¢; and fz, ~12.9%; for ¢3 and t4, 5.2% 


2. Use Theorem 10.11.1 to find the exact equilibrium temperature at the center of the disk in Exercise 1. 


Answer: 


i 
2 
3. Calculate the first two iterates ¢) and ¢@ for case (b) of Figure 10.11.3 with nine interior mesh points 


[Equation 2] when the initial iterate is chosen as 


O=11 1 1 1 1 1 1 1 i}? 


Answer: 
T 
[252542543 
444444444 
[13 18 9 22 13 7 21 16 107" 
16 16 16 16 16 16 #16 #16 = 16 


4. The random walk illustrated in Figure Ex-4a can be described by six arrows 
—s 
that specify the directions of departure from the successive mesh points along the path. Figure Ex-4d is an array 
of 100 computer-generated, randomly oriented arrows arranged in a 10) x 10 array. Use these arrows to 
determine random walks to approximate the temperature ¢5, as in Table 2. Proceed as follows: 


1. Take the last two digits of your telephone number. Use the last digit to specify a row and the other to specify 
a column. 


2. Go to the arrow in the array with that row and column number. 


3. Using this arrow as a starting point, move through the array of arrows as you would read a book (left to right 
and top to bottom). Beginning at the point labeled £5 in Figure Ex-4a and using this sequence of arrows to 
specify a sequence of directions, move from mesh point to mesh point until you reach a boundary mesh 
point. This completes your first random walk. Record the temperature at the boundary mesh point. (If you 
reach the end of the arrow array, continue with the arrow in the upper left corner.) 


4. Return to the interior mesh point labeled ¢5 and begin where you left off in the arrow array; generate your 
next random walk. Repeat this process until you have completed 10 random walks and have recorded 10 
boundary temperatures. 


5. Calculate the average of the 10 boundary temperatures recorded. (The exact value is ¢5 = 17491.) 
2 ig ae ee Se eS 
0 } =~ ~<+ } } =~ t 


? 
~ Y | eaiieanienentinantinattt i aie 


I 
=~ 
oS 


id 
een Aw & Ww WY 


Figure Ex-4 


Section 10.11 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic 
proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be 
able to use your technology utility to solve many of the problems in the regular exercise sets. 


T1. Suppose that we have the square region described by 
R= {G@,w\O<x<10<y<1} 


and suppose that the equilibrium temperature distribution 4 (x, y) along the boundary is given by u(x, 0) = Tg, 


u(x, 1)=7 ry, 2(0, y) =Tz, andu(1, y) = Tp. Suppose next that this region is partitioned into an 
(7 ++ 1) x (# + 1) mesh using 


and yy=e 


x |o- 


x= 

fori =0, 1, 2,...,% and j= 0, 1, 2, ..., x. If the temperatures of the interior mesh points are labeled by 
yj = U(x; Vi) SUG ln, jin) 
then show that 
i rey + My41,7 Fay 3-1 + Wy y41) 
fori = 1, 2, 3,....%—1 and j= 1, 2, 3,...,%— 1. To handle the boundary points, define 
“oy = tii ky = TR, Lo= ig, and Lin= if; 
fori? = 1, 2, 3,....%—1and j= 1, 2, 3,...,%— 1. Next let 
OD fy 
Przi= |‘ 0 | 

be the (% ++ 1) x (2 + 1) matrix with the » % » identity matrix in the upper right-hand corner, a one in the lower 
left-hand corner, and zeros everywhere else. For example, 


010 
m= a F3=/0 0 11], 
10 0 
0100 0 
: : : : 0010 0 
F4= : Fs5=|0 001 0 
a 0000 1 
ee 100 0 0 


and so on. By defining the (7 ++ 1) x (#2 ++ 1) matrix 


T 
Of Od 
Mati =Fati t+ Fag = P | 1 E | 


show that if 07,41 is the (7 + 1) x (#2 + 1) matrix with entries 4), then the set of equations 


1 
w= 4 Mi-1,j b My44j7 Fy 3-1 + uj 541) 


fori = 1, 2, 3,....%—1and j= 1, 2, 3,...,%— 1 can be written as the matrix equation 
1 
OU y4t = q Mat Gnt1 + Uy41Myit) 
where we consider only those elements of 0,4, with? = 1, 2,3,....%—1and j= 1, 2, 3,..,2—1. 


T2. The results of the preceding exercise and the discussion in the text suggest the following algorithm for solving 
for the equilibrium temperature in the square region 


R= (x, y)[Qexs 10<y<1} 


given the boundary conditions 
u(x,0)=Tp, u(x, 1)=Ty, 


uQO,y)=Tz, wu. yj)= 
1. Choose a value for n, and then choose an initial guess, say 
O FE an pO 
POS ose SO EE 
is =|: 
Tp 0 .. O Tr 
OY Pee pe 0 


2. For each value of k = 0, 1, 2, 3, ..., compute oe using 


e+) | () (ke) 
On +t = 4 Mai Gn 41 + O41 M41) 


where Af,,41 is as defined in Exercise T1 . Then adjust dpi by replacing all edge entries by the initial edge 


7 


entries in iy 1" [Note: The edge entries of a matrix are the entries in the first and last columns and first and 
last rows. ] 


3. Continue this process until tig = see is approximately the zero matrix. This suggests that 


i 
nat = in 


Use a computer and this algorithm to solve for s(x, y) given that 
u(x, 0) =0, u(x, 1)=0, u(O, y) =0, a(1,yj=2 
Choose x = § and compute up to de . The exact solution can be expressed as 
8 2» sinh [ (272 — Ljax]smn[ (22 — liny] 
(22 — 1)sinh[ (2:2 — 1)7] 
Use a computer to compute x (i / 6, 7 / 6) se i, j =0, 1, 2,3, 4,5, 6, and then compare your results to the values 


of u(t 6, 7/6) in Se 


u(x, y)= 


T3. Using the exact solution 4 (x, y) for the temperature distribution described in Exercise T2 , use a graphing 
program to do the following: 


(a) Plot the surface z= x(x, y) in three-dimensional xyz-space in which z is the temperature at the point (x, y) in 
the square region. 


(b) Plot several isotherms of the temperature distribution (curves in the xy-plane over which the temperature is a 
constant). 


(c) Plot several curves of the temperature as a function of x with y held constant. 


(d) Plot several curves of the temperature as a function of y with x held constant. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.12 Computed Tomography 


In this section we will see how constructing a cross-sectional view of a human body by analyzing X-ray scans leads to an inconsistent linear 
system. We present an iteration technique that provides an “approximate solution” of the linear system. 


Prerequisites 


Linear Systems 
Natural Logarithms 
Euclidean Space 2” 


The basic problem of computed tomography is to construct an image of a cross section of the human body using data collected from many 
individual beams of X rays that are passed through the cross section. These data are processed by a computer, and the computed cross section is 
displayed on a video monitor. Figure 10.12.1 is a diagram of General Electric's CT system showing a patient prepared to have a cross section of 
his head scanned by X-ray beams. 


Figure 10.12.1 


Such a system is also known as a CAT scanner, for Computer-Aided Tomography scanner. Figure 10.12.2 shows a typical cross section of a 
human head produced by the system. 


Figure 10.12.2 


The first commercial system of computed tomography for medical use was developed in 1971 by G. N. Hounsfield of EMI, Ltd., in England. In 
1979, Houndsfield and A. M. Cormack were awarded the Nobel Prize for their pioneering work in the field. As we will see in this section, the 
construction of a cross section, or tomograph, requires the solution of a large linear system of equations. Certain algorithms, called algebraic 
reconstruction techniques (ARTs), can be used to solve these linear systems, whose solutions yield the cross sections in digital form. 


Scanning Modes 


Unlike conventional X-ray pictures that are formed by X rays that are projected perpendicular to the plane of the picture, tomographs are 
constructed from thousands of individual, hairline-thin X-ray beams that /ie in the plane of the cross section. After they pass through the cross 
section, the intensities of the X-ray beams are measured by an X-ray detector, and these measurements are relayed to a computer where they are 


processed. Figures 10.12.3 and 10.12.4 illustrate two possible modes of scanning the cross section: the parallel mode and the fan-beam mode. 
In the parallel mode a single X-ray source and X-ray detector pair are translated across the field of view containing the cross section, and many 
measurements of the parallel beams are recorded. Then the source and detector pair are rotated through a small angle, and another set of 
measurements is taken. This is repeated until the desired number of beam measurements is completed. For example, in the original 1971 
machine, 160 parallel measurements were taken through 180 angles spaced 1° apart: a total of 160 x 180 = 28, 800 beam measurements. Each 


such scan took approximately 55 minutes. 


X-ray 
detector 


source 


Figure 10.12.3 


Figure 10.12.4 


In the fan-beam mode of scanning, a single X-ray tube generates a fan of collimated beams whose intensities are measured simultaneously by 
an array of detectors on the other side of the field of view. The X-ray tube and detector array are rotated through many angles, and a set of 
measurements is taken at each angle until the scan is completed. In the General Electric CT system, which uses the fan-beam mode, each scan 
takes 1 second. 


Derivation of Equations 


To see how the cross section is reconstructed from the many individual beam measurements, refer to Figure 10.12.5. Here the field of view in 
which the cross section is situated has been divided into many square pixels (picture elements) numbered 1 through N as indicated. It is our 
desire to determine the X-ray density of each pixel. In the EMI system, 6400 pixels were used, arranged in a square 80) x 80 array. The G.E. CT 
system uses 262,144 pixels in a 512 x 512 array, each pixel being about | mm ona side. After the densities of the pixels are determined by the 
method we will describe, they are reproduced on a video monitor, with each pixel shaded a level of gray proportional to its X-ray density. 
Because different tissues within the human body have different X-ray densities, the video display clearly distinguishes the various tissues and 
organs within the cross section. 


X-ray 
detector 


pixel 
X-ray jth 
source pixel 


Figure 10.12.5 


Figure 10.12.6 shows a single pixel with an X-ray beam of roughly the same width as the pixel passing squarely through it. The photons 
constituting the X-ray beam are absorbed by the tissue within the pixel at a rate proportional to the X-ray density of the tissue. Quantitatively, 
the X-ray density of the jth pixel is denoted by *; and is defined by 

number of photons entering the jth pixel 

number of photons leaving the jth pixel 
where “In” denotes the natural logarithmic function. Using the logarithm property In(a / b) = —In(2 / a}, we also have 

Fraction of photons that pass through 
x;=- =< 
J the jth pixel without being absorbed 


Photons entering Photons leaving 
ith pixel jth pixel 


Figure 10.12.6 


If the X-ray beam passes through an entire row of pixels (Figure 10.12.7), then the number of photons leaving one pixel is equal to the number 
of photons entering the next pixel in the row. If the pixels are numbered 1, 2, ..., #, then the additive property of the logarithmic function gives 


ee in| number of photons entering the first pixel 


number of photons leaving the nth pixel 


Jraction of photons that pass (1) 
= <In| through the row of n pixels 
without being absorbed 


Thus, to determine the total X-ray density of a row of pixels, we simply sum the individual pixel densities. 


=< 


Photons entering Photons leaving 
first pixel nth pixel 


First | Second] Third ath 
pixel pixel pixel pixel 


Figure 10.12.7 


Next, consider the X-ray beam in Figure 10.12.5. By the beam density of the ith beam of a scan, denoted by b;, we mean 


number of photons of the ith beam entering the detector 


without the cross section in the field of view 
number of photons of the ith beam entering the detector 


with the cross section in the field af view (2) 


Jraction of photons of the ith beam that 
= =—In| pass through the cross section without 
being absorbed 
The numerator in the first expression for 4, is obtained by performing a calibration scan without the cross section in the field of view. The 


resulting detector measurements are stored within the computer's memory. Then a clinical scan is performed with the cross section in the field 
of view, the 4,'s of all the beams constituting the scan are computed, and the values are stored for further processing. 


For each beam that passes squarely through a row of pixels, we must have 


Jraction of photons of the Jraction of photons of the 
beam that pass through the |__| beam that pass through the 
row of pixels without being | cross section without being 

absorbed absorbed 


Thus, if the ith beam passes squarely through a row of n pixels, then it follows from Equations 1 and 2 that 

149+... +X, = 9; 
In this equation, 4, is known from the clinical and calibration measurements, and x1, x, ..., X, are unknown pixel densities that must be 
determined. 


More generally, if the ith beam passes squarely through a row (or column) of pixels with numbers 1, .j3, ..., ;, then we have 
Xj) HX;jo +b xy, = 2; 


If we set 


aft Pied dedi 
vo 10, otherwise 


then we can write this equation as 
@y1X1 + ay2X2 +... bh QyyxN = 4; (3) 
We will refer to Equation 3 as the ith beam equation. 


Referring to Figure 10.12.5, however, we see that the beams of a scan do not necessarily pass through a row or column of pixels squarely. 
Instead, a typical beam passes diagonally through each pixel in its path. There are many ways to take this into account. In Figure 10.12.8 we 
outline three methods of defining the quantities ¢;; that appear in Equation 3, each of which reduces to our previous definition when the beam 
passes squarely through a row or column of pixels. Reading down the figure, each method is more exact than its predecessor, but with 
successively more computational difficulty. 


Center-of-Pixel Method 


ith beam 
if the ith beam passes 
through the center of 
the jth pixel 


otherwise 
jth pixel 


Center Line Method 


Length of 
center line 
length of the center line 


of the ith beam that lies 
in the jth pixel 
width of the jth pixel 


hl wiath of 


pixel 
Area Method 


Area in the Azca'in th 
numerator of a;; Area m the 
i) 


area of the ith beam that lies in the jth pixel denominator of a;; 
area of the ith beam that would lie in the jth pixel 
if the ith beam were to cross the pixel squarely [=P 


Figure 10.12.8 


Using any one of the three methods to define the 4;;'s in the ith beam equation, we can write the set of M beam equations in a complete scan as 


411%, + @yox2 +...+ @inxn = 4, 
ayjxy + a@39x2 +...4 a@oyxy = 43 (4) 
ayix, & @yoxg +...+ aynwxn = Oy 


In this way we have a linear system of M equations (the M beam equations) in N unknowns (the N pixel densities). 


Depending on the number of beams and pixels used, we can have Af = AV, Af = N. OF M =< HV. We will consider only the case jf = JV, the 
so-called overdetermined case, in which there are more beams in the scan than pixels in the field of view. Because of inherent modeling and 
experimental errors in the problem, we should not expect our linear system to have an exact mathematical solution for the pixel densities. In the 
next section we attempt to find an “approximate” solution to this linear system. 


Algebraic Reconstruction Techniques 


There have been many mathematical algorithms devised to treat the overdetermined linear system 4. The one we will describe belongs to the 
class of so-called Algebraic Reconstruction Techniques (ARTs). This method, which can be traced to an iterative technique originally 
introduced by S. Kaczmarz in 1937, was the one used in the first commercial machine. To introduce this technique, consider the following 
system of three equations in two unknowns: 


Zy: *%1 + x2 = 2 
fz, xy = 2x2 = =2 (5) 
£3: 3x, = x2 = 3 


The lines 41, £3, £3 determined by these three equations are plotted in the *1%3-plane. As shown in Figure 10.12.9a, the three lines do not have 
a common intersection, and so the three equations do not have an exact solution. However, the points (x1, x2) on the shaded triangle formed by 
the three lines are all situated “near” these three lines and can be thought of as constituting “approximate” solutions to our system. The 
following iterative procedure describes a geometric construction for generating points on the boundary of that triangular region (Figure 
10.12.95): 


Algorithm 1 

Step 0 Choose an arbitrary starting point Xp in the x1*3-plane. 

Step 1 Project Xg orthogonally onto the first line £; and call the projection x”. The superscript 1 indicates that this is the first of several 
cycles through the steps. 

Step 2 Project x? orthogonally onto the second line £3 and call the projection x. 


Step 3 Project xs orthogonally onto the third line 3 and call the projection x). 


Step 4 Take xs as the new value of Xg and cycle through Steps 1 through 3 again. In the second cycle, label the projected points x”, x, 


xe? in the third cycle, label the projected points xf. x5. x6: and so forth. 


This algorithm generates three sequences of points 


Ly xi? xf en. 
iL x5 x x8, 
rF x? x xo, 


that lie on the three lines £1, £3, and £3, respectively. It can be shown that as long as the three lines are not all parallel, then the first sequence 
converges to a point x; on £4, the second sequence converges to a point = on 4, and the third sequence converges to a point = on £3 (Figure 
10.12.9c). These three limit points form what is called the limit cycle of the iterative process. It can be shown that the limit cycle is independent 
of the starting point Xp. 


Figure 10.12.9 


Next we discuss the specific formulas needed to effect the orthogonal projections in Algorithm 1. First, because the equation of a line in X12 


-space is 
a1X1 +agx2=h 


we can express it in vector form as 


where 


ay x1 
Elo =f 


The following theorem gives the necessary projection formula (Exercise 5). 


THEOREM 10.12.1 Orthogonal Projection Formula 


Let L bea line in 22 with equation a’x= , and let x* be any point in R2 (Figure 10.12.10). Then the orthogonal projection, Xp, of 
x’ onto L is given by 


Tj * 
_(e-ax), 


aa 


Figure 10.12.10 


EXAMPLE 1 Using Algorithm1 


We can use Algorithm 1 to find an approximate solution of the linear system given in 5 and illustrated in Figure 10.12.9. If we 
write the equations of the three lines as 


fy: alx=), 
Ly: alx=b7 
£3: ax=d3 
where 
e= x] —— 1 —— 1 ee 3 
~ | x2’ Mm Lay al eel a 1" 
then, using Theorem 10.12.1, we can express the iteration scheme in Algorithm 1 as 
TL) 
b;=— 
xe) =x), } Pk m Mo) a, k=1,2,3 
ay, ax 


where p = 1 for the first cycle of iterates, » = 2 for the second cycle of iterates, and so forth. After each cycle of iterates (i.e., 
after xy” is computed), the next cycle of iterates is begun with get) set equal to x. 


Table | gives the numerical results of six cycles of iterations starting with the initial point xg = (1, 3). 


Table 1 


1.00000 | 3.00000 


00000: | 2.00000 
40000 | 1.20000 
1.30000 20000 


= 


1.20000 80000 
88000 | 1.44000 
1.42000 -26000 
1.08000 92000 
83200 41600 
1.40800 22400 


1.09200 90800 
83680 | 1.41840 
1.40920 | 1.22760 


1.09080 90920 
83632 | 1.41816 
1.40908 .22724 


1.09092 90908 
83637 | 1.41818 
1.40909 .22728 


Using certain techniques that are impractical for large linear systems, we can show the exact values of the points of the limit cycle 
in this example to be 


* _ f12 10\_ 
x) =(i7. t= (1.09090... 90909...) 
x} = (35. Fe) = (83636... 1.41818...) 


x; =(3 3p) = (1.40908... 1.22727...) 


It can be seen that the sixth cycle of iterates provides an excellent approximation to the limit cycle. Any one of the three iterates 
© ®, © can be used as an approximate solution of the linear system. (The large discrepancies in the values of x, x, and 


x 
o are due to the artificial nature of this illustrative example. In practical problems, these discrepancies would be much smaller. 


To generalize Algorithm | so that it applies to an overdetermined system of M equations in N unknowns, 


ayxy, + aygxg +...4 away = 21 

ayjxy + a39x2 +...4+ a@oyxy = 43 6) 
ayix, - @yoxg +...+ a@ynxn = Oy 

we introduce column vectors x and a; as follows: 
*1 ay 
x ay 
x= : a; = ss ; i=1,2,...,M@ 
xN ain 


With these vectors, the / equations constituting our linear system 6 can be written in vector form as 

alx=by, i=1,2,..,M@ 
Each of these M equations defines what is called a hyperplane in the N-dimensional Euclidean space 7’. In general these M hyperplanes have 
no common intersection, and so we seek instead some point in RV that is reasonably “close” to all of them. Such a point will constitute an 


approximate solution of the linear system, and its N entries will determine approximate pixel densities with which to form the desired cross 
section. 


As in the two-dimensional case, we will introduce an iterative process that generates cycles of successive orthogonal projections onto the 
hyperplanes beginning with some arbitrary initial point in 2”. Our notation for these successive iterates is 


(p) the iterate ling on the kth hyperplane 
Xj, = : , : 
generated during the pth cycle of iterations 


The algorithm is as follows: 
Algorithm 2 
Step 0 Choose any point in R¥ and label it xp. 
Step 1 For the first cycle of iterates, set p = 1. 
Step 2 Fork =1, 2, ..., Jf, compute 
T ©) 


by, a 

(p (p Xi] 

xy) = xP), 1 (Pk F ar 
aj, Ak 


t+) _ 
Step 3 Set x =x}/- 
Step 4 Increase the cycle number p by | and return to Step 2. 


In Step 2 the iterate x? is called the orthogonal projection of x? 1 onto the hyperplane ax = b;,. Consequently, as in the two-dimensional 


case, this algorithm determines a sequence of orthogonal projections from one hyperplane onto the next in which we cycle back to the first 
hyperplane after each projection onto the last hyperplane. 


It can be shown that if the vectors a1, a3, .... agg span 2, then the iterates x). x), x5), ___ lying on the Mth hyperplane will converge to a 


point Xi on that hyperplane which does not depend on the choice of the initial point Xg. In computed tomography, one of the iterates x”? for p 


sufficiently large is taken as an approximate solution of the linear system for the pixel densities. 


Note that for the center-of-pixel method, the scalar quantity al aj, appearing in the equation in Step 2 of the algorithm is simply the number of 
pixels in which the Ath beam passes through the center. Similarly, note that the scalar quantity 


by axe, 


in that same equation can be interpreted as the excess kth beam density that results if the pixel densities are set equal to the entries of x? 1 This 
provides the following interpretation of our ART iteration scheme for the center-of-pixel method: Generate the pixel densities of each iterate by 


distributing the excess beam density of successive beams in the scan evenly among those pixels in which the beam passes through the center. 
When the last beam in the scan has been reached, return to the first beam and continue. 


EXAMPLE 2 Using Algorihm2 <« 


We can use Algorithm 2 to find the unknown pixel densities of the 9 pixels arranged in the 3 x 3 array illustrated in Figure 
10.12.11. These 9 pixels are scanned using the parallel mode with 12 beams whose measured beam densities are indicated in the 
figure. We choose the center-of-pixel method to set up the 12 beam equations. (In Exercises 7 and 8, you are asked to set up the 
beam equations using the center line and area methods.) As you can verify, the beam equations are 


x7+xg+xg = 13.00 x3+%§6+%9 =18.00 
xaex5+xg = 15.00 xg4+x5+xg =12.00 
XypexG+xZ = 8.00 xXp#xgtxz7 = 6.00 
xXg+xgtxg =14.79 xg+2%3+%5§ =10.51 
xa+x5+x7 =14.31 xXyp#xstxg =16.13 
Xy#xg+xq = 3.81 xXgtxz+xg = 7.04 


Table 2 illustrates the results of the iteration scheme starting with an initial xy = 0. The table gives the values of each of the first 
cycle of iterates, x? through my but thereafter gives the iterates a only for various values of p. The iterates x? start 


repeating to two decimal places for p > 45, and so we take the entries of xi as approximate values of the 9 pixel densities. 


bg = 12.00 


by =6.00 by = 18,00 bio = 10.51 
by) = 16.13 


b, =8.00 


Figure 10.12.11 


Table 2 


Pixel Densities 


AAR AA KAKA 


~ KR 


——_ First Cycle of Iterates ————> 


a 
> 
th 


re [35] 2a | 
Prom [san [256 92 oo] 
fr [sm [oes | am | oo] 
Pr [sar [amr] 935] os] 
Fro [| 201] 9 | oo] 
Pra [are [om [35] ns] 
nfo [as if [a 
[re [a9 [sas [aa 
[re [49 [76 au [ae 


oa 
in 
i) 


+ 
a 
i) 


[ie 
[ise 
lo 
paps 
2 [a 
> [ea 


min 
iw | ta 
a 
t 


ww wn 
vs) w 
i] ww 
ee 


aAlutu F/B 4] ¥41eE 
o}olu c/n} e]so 
an 
S 
w 


We close this section by noting that the field of computed tomography is presently a very active research area. In fact, the ART scheme 
discussed here has been replaced in commercial systems by more sophisticated techniques that are faster and provide a more accurate view of 
the cross section. However, all the new techniques address the same basic mathematical problem: finding a good approximate solution of a 
large overdetermined inconsistent linear system of equations. 


Exercise Set 10.12 


1. (a) Setting x” » = x), x), show that the three projection equations 
T ©) 
xP) =x), 4 Ck ae) : k=1,2,3 


for the three lines in Equation 5 can be written as 


xp =4[24+agp xP} 
k=1 
P= hong? +302) 
xp = 2-244 + 27] 
k=2: 
x2 = dla ax? +m] 
x = 9+ ay + 3x ] 
k=3: 
ea 34 3x2) 4.922) ] 
where (07 *)) x27) = of @) x2) for p= 1, 2,. 
(b) Show that the three pairs of equations in part (a) can be combined to produce 
-1 -1 
somete >, 2- 
= pH leas. 
xy = let aay 3D] 


where a xB x)= a, xO) _ x. aes oo this pair of equations, we can perform one complete cycle of three orthogonal 
projections in a single step.] 
(c) Because x? tends to the limit point x3 as P —* 09, the equations in part (b) become 
x4 = = 3 (28 “+ eh —xx] 
* * 
hes = 35 (24 + 3x31 — 3X39] 


as P — co, Solve this linear system for S = eos d an): [Note: The simplifications of the ART formulas described in this exercise are 
impractical for the large linear systems that arise in realistic computed tomography problems. ] 


Answer: 


* 31 27 
© x= (> -) 


2. Use the result of Exercise 1(b) to find x), xs, od x” to five decimal places in Example | using the following initial points: 
(a) xo = (0, 0) 


(b) Xo = 1, 1) 
(c) Xo = (148, — 15) 


Answer: 


(a) x6? = (1.40000, 1.20000) 
x; = (1.41000, 1.23000) 
x5” = (1.40900, 1.22700) 

 _ (1.40910, 1.22730) 
x5” = (1.40909, 1.22727) 


= (1.40909, 1.22727) 
(b) Same as part (a) 


(©) x = (9.55000, 25.65000) 
x” = (59500, — 1.21500) 
x5” = (1.49050, 1.47150) 
x5” = (1.40095, 1.20285) 
x5’ = (1.40991, 1.22972) 
x; = (1.40901, 1.22703) 


3. (a) Show directly that the points of the limit cycle in Example 1, 
fa (l20 * _ (46 78 gu (3i 2 
aa oe a 55° 554" 3 \22° 22 
form a triangle whose vertices lie on the lines £;, £3, and £3 and whose sides are perpendicular to these lines (Figure 10.12.9c). 


(b) Using the equations derived in Exercise 1(a), show that ifxy = x; = op ? ss 
1 


ad) 2 10 
Pi =) 
2 55’ 55 
ee i 5 Oe 
a = (55-35) 


[Note: Either part of this exercise shows that successive orthogonal projections of any point on the limit cycle will move around the 
limit cycle indefinitely. ] 


4. The following three lines in the x 1%3-plane, 
fy: x2z=1 
fz. x,—x9=2 
£3. x,—x2z=0 
do not have a common intersection. Draw an accurate sketch of the three lines and graphically perform several cycles of the orthogonal 
projections described in Algorithm 1, beginning with the initial point xg = (0, 0}. On the basis of your sketch, determine the three points of 
the limit cycle. 


Answer: 


* * * 
x = (1, 1), X4 => (2, 0), X3 = (1, 1) 
5. Prove Theorem 10.12.1 by verifying that 
(a) the point Xp as defined in the theorem lies on the line ax =} (ie., ax, =). 


(b) the vector Xp — x’ is orthogonal to the line g7, — » (i.e., Xp — x’ is parallel to a). 


6. As stated in the text, the iterates x, x0), x5), ___ defined in Algorithm 2 will converge to a unique limit point <i if the vectors 
aj, a3, ..., ayg Span R™. Show that if this is the case and if the center-of-pixel method is used, then the center of each of the N pixels in the 
field of view is crossed by at least one of the M beams in the scan. 


7. Construct the 12 beam equations in Example 2 using the center line method. Assume that the distance between the center lines of adjacent 
beams is equal to the width of a single pixel. 


Answer: 


x7 +2x%g+%9 = 13.00 

xX4+x5+%65 = 15.00 

xy +x2+%3= 8.00 

82843 (xg + Xa) + .58579x9 = 14.79 
1.41421 (x3 +25 + x7) = 14.31 
82843 (x2 + x4) + .58579x1 = 3.81 
x3+%§ +29 = 18.00 

x9+%5+%g = 12.00 

Xp +xg+x7 = 6.00 

82843 (x2 + x6) + .58579x3 = 10.51 
1.41421 (x1 +25 + x9) = 16.13 
82843 (x4 + xg) + .58579x7 = 7.04 


8. Construct the 12 beam equations in Example 2 using the area method. Assume that the width of each beam is equal to the width of a single 
pixel and that the distance between the center lines of adjacent beams is also equal to the width of a single pixel. 


Answer: 


x7 +2xg+%9 = 13.00 

x4 x5 +2x6 = 15.00 

X1+%24+%3= 8.00 

04289 (x3 + x5 +29) +. 75000 (xg + xg) + .61396x9 = 14.79 
91421 (x3 + x5 +27) + .25000 (x9 + 244+ 26+ x9) = 14.31 
04289 (x3 + x5 4+ x7) + .75000(x2 + x4) + .61396x1 = 3.81 
x3+%6 +29 = 18.00 

x2+%5+xg = 12.00 

Xp +x4+x7= 6.00 

04289 (x14 + x5 + 2x9) +. 75000 (x9 + x6) + .61396x3 = 10.51 
91421 (xy + x5 + x9) + .25000 (x9 + x4 + x6 +X) = 16.13 
04289 (x1 + x5 + x9) + .75000(x4 + xg) + .61396x7 = 7.04 


Section 10.12 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or 
Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each 
exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you 
with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your 
technology utility to solve many of the problems in the regular exercise sets. 


T1. Given the set of equations 
anx + bay =CE 
fork = 1, 2, 3, ..., » (with » = 2), let us consider the following algorithm for obtaining an approximate solution to the system. 


1. Solve all possible pairs of equations 


ax + diy =c; and ajx + diy =e; 
fori, j= 1, 2, 3,..., » andi <j for their unique solutions. This leads to 
1 
<n(n—1 
ain ) 
solutions, which we label as 
(xij, Yay) 


fori, j= 1, 2,3,.., 2 andi< j. 


2. Construct the geometric center of these points defined by 


_f{/ 2 "8 2 . 
xaryo= 3 > Xi, > >> Yij 


n(w—1) i=lj=i+l ula — 1) i=1lj=i+1 


and use this as the approximate solution to the original system. 


Use this algorithm to approximate the solution to the system 


x+ y= 2 
x—2y= -2 
3x= y= 3 


and compare your results to those in this section. 
T2. (Calculus required) Given the set of equations 

anx + bay =CE 
fork = 1, 2, 3, ..., 2 (with » = 2), let us consider the following least squares algorithm for obtaining an approximate solution (x, y') to the 
system. Given a point (a, (7) and the line g;x 4. 4; =c,, the distance from this point to the line is given by 

aja + 6,8 =; 

ya? | b? 
If we define a function f (x, y) by 
fin,y) = So Gat by sey” 
i=l ay + 2; 

and then determine the point (x ¥ ) that minimizes this function, we will determine the point that is closest to each of these lines in a 


* - 
summed least squares sense. Show that x” and y are solutions to the system 


n ae * ” a,b; * n asc; 
ae tea WY 
i=1 ay + d5 i=l aj + dF i=] ap + d5 


and 


=1 a? +3? =1 a? +? i=l a? +2? 


Apply this algorithm to the system 


x+ y= 2 
x—2y= -2 
3x= y= 3 


and compare your results to those in this section. 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.13 Fractals 


In this section we will use certain classes of linear transformations to describe and generate intricate sets in the Euclidean plane. These sets, called fractals, are 
currently the focus of much mathematical and scientific research. 


Prerequisites 


Geometry of Linear Operators on R2 (Section 4.11) 
Euclidean Space 2” 

Natural Logarithms 

Intuitive Understanding of Limits 


Fractals in the Euclidean Plane 


At the end of the nineteenth century and the beginning of the twentieth century, various bizarre and wild sets of points in the Euclidean plane began appearing in 
mathematics. Although they were initially mathematical curiosities, these sets, called fractals, are rapidly growing in importance. It is now recognized that they reveal 


a regularity in physical and biological phenomena previously dismissed as “random,” “noisy,” or “chaotic.” For example, fractals are all around us in the shapes of 


clouds, mountains, coastlines, trees, and ferns. 


In this section we give a brief description of certain types of fractals in the Euclidean plane 22. Much of this description is an outgrowth of the work of two 
mathematicians, Benoit B. Mandelbrot and Michael Barnsley, who are both active researchers in the field. 


Self-Similar Sets 


To begin our study of fractals, we need to introduce some terminology about sets in 22. We will call a set in 22 bounded if it can be enclosed by a suitably large 
circle (Figure 10.13.1) and closed if it contains all of its boundary points (Figure 10.13.2). Two sets in 22 will be called congruent if they can be made to coincide 
exactly by translating and rotating them appropriately within 22 (Figure 10.13.3). We will also rely on your intuitive concept of overlapping and nonoverlapping 
sets, as illustrated in Figure 10.13.4. 


Enclosing y 
circle 


Unbounded set 


[Ls 


(a) Set enclosed by a circle (b) This set cannot be 
enclosed by any circle. 


Figure 10.13.1 


Figure 10.13.2 The boundary points (solid color) lie in the set. 


Congruent sets 


Figure 10.13.3 


(a) Overlapping sets 


x 


(b) Nonoverlapping sets 


Figure 10.13.4 


If 7-R2 _, R? is the linear operator that scales by a factor of s (see Table 7 of Section 4.9), and if Q is a set in p2, then the set T(Q) (the set of images of points in Q 
under 7) is called a dilation of the set O if s =. 1 and a contraction of O if 0 = g =< | (Figure 10.13.5). In either case we say that TQ) is the set O scaled by the factor 
Ss. 


G1 > 


Figure 10.13.5 A contraction of Q. 


The types of fractals we will consider first are called se/f-similar. In general, we define a self-similar set in 22 as follows: 


DEFINITION 1 


A closed and bounded subset of the Euclidean plane p2 is said to be self-similar if it can be expressed in the form 
S= 8), US2US3U...US_ (1) 


where Sj, 83, 53, ..., Sj, are nonoverlapping sets, each of which is congruent to S scaled by the same factor s (0 << 1). 


If S is a self-similar set, then 1 is sometimes called a decomposition of S into nonoverlapping congruent sets. 
EXAMPLE 1 Line Segment << 


A line segment in 2? (Figure 10.13.6a) can be expressed as the union of two nonoverlapping congruent line segments (Figure 10.13.60). In Figure 
10.13.65 we have separated the two line segments slightly so that they can be seen more easily. Each of these two smaller line segments is congruent to 
1 


the original line segment scaled by a factor of 4. Hence, a line segment is a self-similar set with = 2 ands = = 
SSS 


(a) 


—aa 
(b) 
Figure 10.13.6 


EXAMPLE 2 Square <@ 


A square (Figure 10.13.7a) can be expressed as the union of four nonoverlapping congruent squares (Figure 10.13.76), where we have again separated 
the smaller squares slightly. Each of the four smaller squares is congruent to the original square scaled by a factor of = Hence, a square is a self-similar 


set with ¢ — 4 ands= }. 


Figure 10.13.7 


EXAMPLE 3 SierpinskiCarpet <4 


The set suggested by Figure 10.13.8a, the Sierpinski “carpet,” was first described by the Polish mathematician Waclaw Sierpinski (1882-1969). It can 
be expressed as the union of eight nonoverlapping congruent subsets (Figure 10.13.8b), each of which is congruent to the original set scaled by a factor 


of 4. Hence, it is a self-similar set with * = 8 and s = 3 Note that the intricate square-within-a-square pattern continues forever on a smaller and 


smaller scale (although this can only be suggested in a figure such as the one shown). 


(a) (b) 


Figure 10.13.8 


EXAMPLE 4 Sierpinski Triangle << 


Figure 10.13.9q illustrates another set described by Sierpinski. It is a self-similar set with & = 3 ands = 5 (Figure 10.13.95). As with the Sierpinski 


carpet, the intricate triangle-within-a-triangle pattern continues forever on a smaller and smaller scale. 


= : Ba 
pein ” i 
BBB BB BB Be 


(a) 


Figure 10.13.9 


The Sierpinski carpet and triangle have a more intricate structure than the line segment and the square in that they exhibit a pattern that is repeated indefinitely. This 
difference will be explored later in this section. 


Topological Dimension of a Set 


In Section 4.5 we defined the dimension of a subspace of a vector space to be the number of vectors in a basis, and we found that definition to coincide with our 
intuitive sense of dimension. For example, the origin of 2? is zero-dimensional, lines through the origin are one-dimensional, and 2? itself is two-dimensional. This 
definition of dimension is a special case of a more general concept called topological dimension, which is applicable to sets in R” that are not necessarily subspaces. 
A precise definition of this concept is studied in a branch of mathematics called topology. Although that definition is beyond the scope of this text, we can state 
informally that 


* apoint in p2 has topological dimension zero; 

* acurve in R2 has topological dimension one; 

* aregion in R2 has topological dimension two. 

It can be proved that the topological dimension of a set in 2” must be an integer between 0 and 2, inclusive. In this text we will denote the topological dimension of a 


set S by dr {S). 


EXAMPLE 5 Topological Dimensions of Sets << 


Table 1 gives the topological dimensions of the sets studied in our earlier examples. The first two results in this table are intuitively obvious; however, 
the last two are not. Informally stated, the Sierpinski carpet and triangle both contain so many “holes” that those sets resemble web-like networks of 
lines rather than regions. Hence they have topological dimension one. The proofs are quite difficult. 


Table 1 


teem 


Sierpinski carpet | 1 | 
Sierpinski triangle za 


Hausdorff Dimension of a Self-Similar Set 


In 1919 the German mathematician Felix Hausdorff (1868-1942) gave an alternative definition for the dimension of an arbitrary set in 2”. His definition is quite 
complicated, but for a self-similar set, it reduces to something rather simple: 


DEFINITION 1 


The Hausdorff dimension of a self-similar set S of form | is denoted by @ j;(S) and is defined by 


_ Ink 
aH) = tsp (2) 


In this definition, “In” denotes the natural logarithm function. Equation 2 can also be expressed as 


s4 HS) — i (3) 


in which the Hausdorff dimension d ;;(.S) appears as an exponent. Formula 3 is more helpful for interpreting the concept of Hausdorff dimension; it states, for 


1 


a 
example, that if you scale a self-similar set by a factor of s = > then its area (or more properly its measure) decreases by a factor of (3) HO) Thais acdtiny a lin 


1 
segment by a factor of 4 reduces its measure (length) by a factor of (3) = + and scaling a square region by a factor of 4 reduces its measure (area) by a factor of 


(y= 


Before proceeding to some examples, we should note a few facts about the Hausdorff dimension of a set: 
© The topological dimension and Hausdorff dimension of a set need not be the same. 
° The Hausdorff dimension of a set need not be an integer. 


* The topological dimension of a set is less than or equal to its Hausdorff dimension; that is, a 7(S) <d y(S). 


EXAMPLE 6 Hausdorff Dimensions of Sets 


Table 2 lists the Hausdorff dimensions of the sets studied in our earlier examples. 


Table 2 


wake ‘ Ink 
Set 5 dj S) = In (js) 


fee [foam 


Sierpinski carpet In 8/In 3 = 1.892... 


Sierpinski triangle In 3/In 2 = 1.584... 


Fractals 
Comparing Tables 1 and 2, we see that the Hausdorff and topological dimensions are equal for both the line segment and square but are unequal for the Sierpinski 


carpet and triangle. In 1977 Benoit B. Mandelbrot suggested that sets for which the topological and Hausdorff dimensions differ must be quite complicated (as 
Hausdorff had earlier suggested in 1919). Mandelbrot proposed calling such sets fractals, and he offered the following definition. 


DEFINITION 3 


A fractal is a subset of a Euclidean space whose Hausdorff dimension and topological dimension are not equal. 


According to thisdefinition, the Sierpinski carpet and Sierpinski triangle are fractals, whereas the line segment and square are not. 


It follows from the preceding definition that a set whose Hausdorff dimension is not an integer must be a fractal (why?). However, we will see later that the converse 
is not true; that is, it is possible for a fractal to have an integer Hausdorff dimension. 


Similitudes 


We will now show how some techniques from linear algebra can be used to generate fractals. This linear algebra approach also leads to algorithms that can be 
exploited to draw fractals on a computer. We begin with a definition. 


DEFINITION 4 


A similitude with scale factor s is a mapping of 2? into R2 of the form 


r(>])=| 5 Seollel+[7 


where s, 0, e, and fare scalars. 


Geometrically, a similitude is a composition of three simpler mappings: a scaling by a factor of s, a rotation about the origin through an angle 9, and a translation (e 
units in the x-direction and funits in the y-direction). Figure 10.13.10 illustrates the effect of a similitude on the unit square U. 


cA (Sealing 


5 
TW) Y 


\ (Rotation) 


(Translation) 


x 


(0, 0) (1, 0) 


(a) Unit square (6) Unit square 
after similitude 


Figure 10.13.10 


For our application to fractals, we will need only similitudes that are contractions, by which we mean that the scale factor s is restricted to the range 0) <5 < 1. 
Consequently, when we refer to similitudes we will always mean similitudes subject to this restriction. 


Similitudes are important in the study of fractals because of the following fact: 


Uf TR? —, R? is a similitude with scale factor s and if S is a closed and bounded set in R?, then the image T(S) of the set S under T is congruent to S scaled 
bys. 


Recall from the definition of a self-similar set in 22 that a closed and bounded set S in 2? is self-similar if it can be expressed in the form 
§=8, US2USZU...US; 


where Sj, 52, S3, ..., 5%, are nonoverlapping sets each of which is congruent to S' scaled by the same factor s (0 << 1) [see 1]. In the following examples, we will 
find similitudes that produce the sets Sj, 83, S3, ..., 5, from S for the line segment, square, Sierpinski carpet, and Sierpinski triangle. 


EXAMPLE 7 LineSegment << 


We will take as our line segment the line segment S connecting the points (0, 0) and (1, 0) in the xy-plane (Figure 10.13.11a). Consider the two 
similitudes 
x 1/1 O}fx 
alb]) - ao oP] 
4 
Tal| 5 aft oxy ts 7 
Ly 210 17} |G 


both of which have s = 5 and 4 = 0. In Figure 10.13.115 we show how these two similitudes map the unit square U. The similitude 7 maps U onto 


the smaller square 7; (0), and the similitude 73 maps U onto the smaller square 72(/). At the same time, 7 maps the line segment S onto the 
smaller line segment 7; (S), and 73 maps S onto the smaller nonoverlapping line segment 7(5). The union of these two smaller nonoverlapping line 
segments is precisely the original line segment S; that is, 


S=T,(S)UT2(8) (5) 


EXAMPLE 8 Square << 


Let us consider the unit square U in the xy-plane (Figure 10.13.12a) and the following four similitudes, all having s = 


10 
01 


a 
2 


nC) 20 


m(5]) 


| 
| 


E 
IP 


1 0 
0 1 


1 


2 


+ 


TU) TV) 


TS) $ TS) 
0) « 


(b) 


t 


1,0) 


Figure 10.13.11 


aE 
“ff 


0 
a 
2 


) 
] 


1 


The images of the unit square U under these four similitudes are the four squares shown in Figure 10.13.125. Thus, 


U=T(U)UT2(U) UT3(U) UT4(U) 


=andg=0): 

1 
als ibl+ 2 
210 14” 0 

1 (6) 
Aji o}fs 2 
ar P]+ 1 
2 

(7) 


is a decomposition of U into four nonoverlapping squares that are congruent to U scaled by the same scale factor (s = a) 


(0, 1) 


(0, 0) 


(0.1) 


(0, 0) 


(a) 


_ 


.°) 
(b) 


(1, 0) 


(1, 0) 


a, 


(l, 


1) 


1) 


Figure 10.13.12 


EXAMPLE 9 Sierpinski Carpet << 


Let us consider a Sierpinski carpet S over the unit square U of the xy-plane (Figure 10.13.13a) and the following eight similitudes, all having s = ; and 


g=0: 
ej 
S We[2} 22 » 


e; 
where the eight values of | i, are 


z 


[>| 


The images of S under these eight similitudes are the eight sets shown in Figure 10.13.13b. Thus, 


Wife bo|Do 
WIN Oo 
WIN bale 
WIPO W[Do 


S= 7 (8) U 72(8) U 73(8) U...U Tg(8) (9) 


is a decomposition of S into eight nonoverlapping sets that are congruent to S scaled by the same scale factor ( = a 


(0,1) 


(0,0) 


Figure 10.13.13 


EXAMPLE 10 Sierpinski Triangle << 


Let us consider a Sierpinski triangle S' fitted inside the unit square U of the xy-plane, as shown in Figure 10.13.14a, and the following three similitudes, 


all having s = 5 and @=0): 


req 7 1 

x 1] 1 O}f% = 

T: = = -|2 
a(l>]) z[) i>] ; (10) 

req 7 0 

x _ 1 1 Olfx 
mb) - abo bh 
a= 4 2 
The images of S under these three similitudes are the three sets in Figure 10.13.14b. Thus, 

S= 7 (8) U T2(8) U 73(8) (1) 


is a decomposition of S into three nonoverlapping sets that are congruent to S scaled by the same scale factor ( = ak 


(0, 0) 


Figure 10.13.14 


In the preceding examples we started with a specific set S and showed that it was self-similar by finding similitudes T,, 72, 73, ..., Tj, with the same scale factor 
such that 7; (5), T72(S), 73(S), .... 7,(S) were nonoverlapping sets and such that 


S=71(S) U 728) U 73(S) U...U TS) (12) 


The following theorem addresses the converse problem of determining a self-similar set from a collection of similitudes. 


THEOREM 10.13.1 


IfT}, Tz, 73, ..., Tj, are contracting similitudes with the same scale factor, then there is a unique nonempty closed and bounded set S in the Euclidean plane 
such that 


S= 71 (S) U 72(8) U73(8) U...U TS) 
Furthermore, if the sets 7 (S), 72(S), 73(S), .... 7,(S) are nonoverlapping, then Sis self-similar. 


Algorithms for Generating Fractals 


In general, there is no simple way to obtain the set S in the preceding theorem directly. We now describe an iterative procedure that will determine S from the 
similitudes that define it. We first give an example of the procedure and then give an algorithm for the general case. 


EXAMPLE 11 Sierpinski Carpet <4 


Figure 10.13.15 shows the unit square region Sg in the xy-plane, which will serve as an “initial” set for an iterative procedure for the construction of the 
Sierpinski carpet. The set Sj in the figure is the result of mapping Sg with each of the eight similitudes 7; (= 1, 2, ..., 8) in 8 that determine the 
Sierpinski carpet. It consists of eight square regions, each of side length 3 surrounding an empty middle square. Next we apply the eight similitudes to 
Sj and arrive at the set $3. Similarly, applying the eight similitudes to S' results in the set Sz. It we continue this process indefinitely, the sequence of 
sets Sy, 52, 53, ... will “converge” to a set S, which is the Sierpinski carpet. 


a.) 
(0, 1) 


(0, 0) (1. 0) 


Figure 10.13.15 


Remark Although we should properly give a definition of what it means for a sequence of sets to “converge” to a given set, an intuitive interpretation will suffice in 
this introductory treatment. 


Although we started in Figure 10.13.15 with the unit square region to arrive at the Sierpinski carpet, we could have started with any nonempty set Sp. The only 
restriction is that the set Sg be closed and bounded. For example, if we start with the particular set Sg shown in Figure 10.13.16, then Sj is the set obtained by 
applying each of the eight similitudes in 8. Applying the eight similitudes to Sj results in the set $3. As before, applying the eight similitudes indefinitely yields the 
Sierpinski carpet S as the limiting set. 


(0,1) 


CUREEEERE 
@& ee ee & 
CURE EEEEE 
eee eee 
« & « & 
eee eee 
VERE ERERE 
@ ee ee & 
CURE EEEEE 


(0,0) (1, 0) 


nk z afiet 
: ik bi 


“ 
u 


ht 
pet 
BY 
tt 
RY 
Ss S 
Figure 10.13.16 


The general algorithm illustrated in the preceding example is as follows: Let 7, 73, T3, ..., 7, be contracting similitudes with the same scale factor, and for an 
arbitrary set Q in p2, define the set 7(Q) by 


F(Q) = T1(Q) U T2(Q) U T3(Q) U...U TRO) 
The following algorithm generates a sequence of sets Sg, Sj, ..., Sy, ... that converges to the set S in Theorem 10.13.1. 
Algorithm 1 
Step 0 Choose an arbitrary nonempty closed and bounded set Sg in Re. 
Step 1 Compute S} = 7 (Sp). 
Step 2 Compute $3 = 7(S}). 
Step 3 Compute S'3 = 7(S3). 


Step n Compute S;, = 7 (S,-1). 


EXAMPLE 12 Sierpinski Triangle << 


Let us construct the Sierpinski triangle determined by the three similitudes given in 10. The corresponding set mapping is 
7(Q) = T1(Q) U T2(Q) U T3(Q). Figure 10.13.17 shows an arbitrary closed and bounded set Sp; the first four iterates Sy, $3, Sz, Sig; and the limiting 
set S (the Sierpinski triangle). 


(0, 0) (1, 0) 


ss 
% 
% 
5S 
s 
> S 
5S 
s 


EXAMPLE 13 Using Algorithm1 <4 


Consider the following two similitudes: 


ni(l>) 
7([>]) 


v 


Lal al al alalalalalalalalalal mal 
y 
we 
* 


" 

ve 

oT) 
vyeeryeny 


" 
vere 

rn 
" 
ve 
vn 


Sal 
Sok al al al 
al 


uy 


Figure 10.13.17 


_1fcos# sind [>] ' 3 
~ 2] sin@ cos@ ||” 3 


The actions of these two similitudes on the unit square U are illustrated in Figure 10.13.18. Here, the rotation angle 0 is a parameter that we will vary to 
generate different self-similar sets. The self-similar sets determined by these two similitudes are shown in Figure 10.13.19 for various values of 0. For 
simplicity, we have not drawn the xy-axes, but in each case the origin is the lower left point of the set. These sets were generated on a computer using 


Algorithm | for the various values of 0. Because ¢ = 2 and s = 


it follows from 2 that the Hausdorff dimension of these sets for any value of 0 is 1. It 


can be shown that the topological dimension of these sets is 1 for @ = Q and 0 for all other values of 0. It follows that the self-similar set for @ = 0 is not 
a fractal [it is the straight line segment from (0, 0) to (.6, .6)], while the self-similar sets for all other values of @ are fractals. In particular, they are 


examples of fractals with integer Hausdorff dimension. 


y 


(a) 


6=50° 6 = 40° 


A Monte Carlo Approach 


aa x‘ 5 (.6, .6) 
~ 4 
- ba \ 
. 9 
tt, \ 
ms x J 
= x 3 
% s 7 »/ } 
} } 
. } Pi 
t t }? 
‘ 3/ / 
} Ai / 
/ / “ (0,0) 
6 = 30' 6=20 0=10 0=0 
Figure 10.13.19 


The set-mapping approach of constructing self-similar sets described in Algorithm 1 is rather time-consuming on a computer because the similitudes involved must be 
applied to each of the many computer screen pixels in the successive iterated sets. In 1985 Michael Barnsley described an alternative, more practical method of 
generating a self-similar set defined through its similitudes. It is a so-called Monte Carlo method that takes advantage of probability theory. Barnsley refers to it as 
the Random Iteration Algorithm. 


Let Tj, 72, 73, ..., Tj, be contracting similitudes with the same scale factor. The following algorithm generates a sequence of points 


ee eee eee 


that collectively converge to the set S in Theorem 10.13.1. 
Algorithm 2 


x 
Step 0 Choose an arbitrary point on in S. 


Step 1 Choose one of the & similitudes at random, say Trey, and compute 
Step 2 Choose one of the & similitudes at random, say Tea and compute 


Step n Choose one of the & similitudes at random, say They and compute 
xy Xn-1 
=T, 
pal tm(>r]) 
On a computer screen the pixels corresponding to the points generated by this algorithm will fill out the pixel representation of the limiting set S. 


Figure 10.13.20 shows four stages of the Random Iteration Algorithm that generate the Sierpinski carpet, starting with the initial point [> 


Remark Although Step 0 in the preceding algorithm requires the selection of an initial point in the set S, which may not be known in advance, this is not a serious 
problem. In practice, one can usually start with any point in 2? and after a few iterations (say ten or so), the point generated will be sufficiently close to S that the 


algorithm will work correctly from that point on. 


z pire es 
Moora Segway: 
i es: 


5000 iterations 15,000 iterations 45,000 iterations 100,000 iterations 


Figure 10.13.20 


More General Fractals 


So far, we have discussed fractals that are self-similar sets according to the definition of a self-similar set in R2. However, Theorem 10.13.1 remains true if the 


similitudes 7}, 73, ..., Tj, are replaced by more general transformations, called contracting affine transformations. An affine transformation is defined as follows: 


DEFINITION 5 


An affine transformation is a mapping of 2? into R2 of the form 


where a, b, c, d, e, and fare scalars. 


Figure 10.13.21 shows how an affine transformation maps the unit square U onto a parallelogram 7( 2). An affine transformation is said to be contracting if the 
Euclidean distance between any two points in the plane is strictly decreased after the two points are mapped by the transformation. It can be shown that any k 
contracting affine transformations 7}, 73, ..., Tj, determine a unique closed and bounded set S satisfying the equation 


S=T,(S) UT2(8) U 73(8) U...U TS) (13) 


Equation 13 has the same form as Equation 12, which we used to find self-similar sets. Although Equation 13, which uses contracting affine transformations, does not 
determine a self-similar set S, the set it does determine has many of the features of self-similar sets. For example, Figure 10.13.22 shows how a set in the plane 
resembling a fern (an example made famous by Barnsley) can be generated through four contracting affine transformations. Note that the middle fern is the slightly 
overlapping union of the four smaller affine-image ferns surrounding it. Note also how 73, because the determinant of its matrix part is zero, maps the entire fern onto 
the small straight line segment between the points (.50, 0) and (.50, .16). Figure 10.13.22 contains a wealth of information and should be studied carefully. 


(0, 0) (1,0) 


(a) Unit square 


(atb+ect+d+f) 
(b+ed+f) 


(b) Unit square after 
affine transformation 


Figure 10.13.21 


(115, 1.030) (.965, .990) 


(340, .495) 


(140, 265) (.600, .275) 
(.075, .180) 


|X| —a 
“[3)=[3 SIG)+[es) 0 )=Le SiG] (8) 


(0,1) a.) 


(0, 0) (1,0) 


GDB MG] <GD-C SIG 


(.705, 414) 


VW 


(50, .16) 


(425, 174) (855, 154) 


(50, 0) 


(575, —.086) 


Figure 10.13.22 


Michael Barnsley has applied the above theory to the field of data compression and transmission. The fern, for example, is completely determined by the four affine 
transformations 7}, 73, 73, T'4. These four transformations, in turn, are determined by the 24 numbers given in Figure 10.13.22 defining their corresponding values 


of a, b, c, d, e, and f: In other words, these 24 numbers completely encode the picture of the fern. Storing these 24 numbers in a computer requires considerably less 
memory space than storing a pixel-by-pixel description of the fern. In principle, any picture represented by a pixel map on a computer screen can be described 
through a finite number of affine transformations, although it is not easy to determine which transformations to use. Nevertheless, once encoded, the affine 
transformations generally require several orders of magnitude less computer memory than a pixel-by-pixel description of the pixel map. 


Further Readings 


Readers interested in learning more about fractals are referred to the following books, the first of which elaborates on the linear transformation approach of 
this section. 


1. Michael Barnsley, Fractals Everywhere (New York: Academic Press, 1993). 
2. Benoit B. Mandelbrot, The Fractal Geometry of Nature (New York: W. H. Freeman, 1982). 
3. Heinz-Otto Peitgen and P. H. Richter, The Beauty of Fractals (New York: Springer-Verlag, 1986). 


4. Heinz-Otto Peitgen and Dietmar Saupe, The Science of Fractal Images (New York: Springer-Verlag, 1988). 


Exercise Set 10.13 


1. The self-similar set in Figure Ex-1 has the sizes indicated. Given that its lower left corner is situated at the origin of the xy-plane, find the similitudes that 
determine the set. What is its Hausdorff dimension? Is it a fractal? 


Figure Ex-1 


Answer: 
1 0 B 
{fx 1271 0 7 23 0| | sz 25 |. _ 25) _ 
7.([>])= ar IP }+[p panes where the four values of ly, are | 5 |. 7 ' B and | 15 ,d(S) =In(4) (1n( 55) = 1.888... 
25 


nN 


. Find the Hausdorff dimension of the self-similar set shown in Figure Ex-2. Use a ruler to measure the figure and determine an approximate value of the scale factor 
s. What are the rotation angles of the similitudes determining this set? 


Figure Ex-2 


Answer: 


gs 47; 4 7(S) = In(4) /In(1/.47) = 1.8. ... Rotation angles: 9° (upper left); —99° (upper right); 190° (lower left); 180° (lower right) 


we 


Each of the 12 self-similar sets in Figure Ex-3 results from three similitudes with scale factor of 4, and so all have Hausdorff dimension {py 3 / In 2 = 1.584... The 


rotation angles of the three similitudes are all multiples of 99°. Find these rotation angles for each set and express them as a triplet of integers (#1, #2, #3), where 
»; is the corresponding integer multiple of 99° in the order upper right, lower left, lower right. For example, the first set (the Sierpinski triangle) generates the 
triplet (0, 0, 0). 


Figure Ex-3 


Answer: 


(0, 0, 0), (1, 0, 0), (2, 0, 0), (3, 0, 0), (0, 0, 1), (0, 0, 2), C1, 2, 03, (2, 1, 3), (2, 0, 1), (2, 0, 2), (2, 2, 0), (0, 3, 3) 
4. For each of the self-similar sets in Figure Ex-4, find: 

(i) the scale factor s of the similitudes describing the set; 

(ii) the rotation angles @ of all similitudes describing the set (all rotation angles are multiples of 99°); and 

(iii) the Hausdorff dimension of the set. 

Which of the sets are fractals and why? 


Figure Ex-4 


Answer: 


(a) (i)s= - (ii) all rotation angles are 9°; (iii) 2 p(S) =In(7) /In(3) = 1.771 . ... This set is a fractal. 
(6) )s= 3 (ii) all rotation angles are 10°; (iii) @ j(S) =In(3) / In(2) = 1.584 . ... This set is a fractal. 
(C) (i)s= 3 (ii) rotation angles: —99° (top); 130° (lower left); 130° (lower right); (iii) @ (8) = In(3) / In(2) = 1.584 . ... This set is a fractal. 
() (i)s= 3 (ii) rotation angles: 99° (upper left); 120° (upper right); 190° (lower right); (iii) d (5) = In(3) / In(2) = 1.584 . ... This set is a fractal. 
5. Show that of the four affine transformations shown in Figure 10.13.22, only the transformation 7 is a similitude. Determine its scale factor s and rotation angle g. 


Answer: 


s=.8509..,0= —2. 69°... 
6. Find the coordinates of the tip of the fern in Figure 10.13.22. [Hint: The transformation 73 maps the tip of the fern to itself.] 
Answer: 


(0.766, 0.996) rounded to three decimal places 


7. The square in Figure 10.13.7a was expressed as the union of 4 nonoverlapping squares as in Figure 10.13.7b. Suppose that it is expressed instead as the union of 
16 nonoverlapping squares. Verify that its Hausdorff dimension is still 2, as determined by Equation 2. 


Answer: 


@ p(S) =In(16) /In(4) =2 
8. Show that the four similitudes 


[x _ 3/1 olf 
ni([> ~ il WE] 
nlt]) = 2[1 Ve]4[4] 
([> - alo (Ly ‘ 

- = 0] 

x — By 1 O;fx 
7([> = AR >] + 

i 

[x _ 3/1 Olfx 4 

T4([>| ~ il | >]+ 1 
4 


express the unit square as the union of four overlapping squares. Evaluate the right-hand side of Equation 2 for the values of k and s determined by these 
similitudes, and show that the result is not the correct value of the Hausdorff dimension of the unit square. [Note: This exercise shows the necessity of the 
nonoverlapping condition in the definition of a self-similar set and its Hausdorff dimension. ] 


Answer: 
4\_ 
in(4) iin(3) =4818... 


9. All of the results in this section can be extended to R”. Compute the Hausdorff dimension of the unit cube in 23 (see Figure Ex-9). Given that the topological 
dimension of the unit cube is 3, determine whether it is a fractal. [Hint: Express the unit cube as the union of eight smaller congruent nonoverlapping cubes. ] 


Figure Ex-9 


Answer: 


@ ¢7(S) =In(8) / In(2) = 3; the cube is not a fractal. 


10. The set in 2 in Figure Ex-10 is called the Menger sponge. It is a self-similar set obtained by drilling out certain square holes from the unit cube. Note that each 


face of the Menger sponge is a Sierpinski carpet and that the holes in the Sierpinski carpet now run all the way through the Menger sponge. Determine the values 
of k and s for the Menger sponge and find its Hausdorff dimension. Is the Menger sponge a fractal? 


Figure Ex-10 


Answer: 


k=20;5= 7 d (8) = In(20) / In(3) = 2.726...; the set is a fractal. 


11. The two similitudes 


2 
xy]\ 4 1 Ojfx = 
ll>))-3[0 IE) /3 
0 
determine a fractal known as the Cantor set. Starting with the unit square region U as an initial set, sketch the first four sets that Algorithm 1 determines. Also, 


find the Hausdorff dimension of the Cantor set. (This famous set was the first example that Hausdorff gave in his 1919 paper of a set whose Hausdorff dimension 
is not equal to its topological dimension.) 


Answer: 


Initial set 


First iterate 


ee Bw BB Second iterate 


ae oa © e8 © Third iterate 


~~ Pourth iterate 


d y(S) =In(2) f In(3) = 0.6309... 


12. Compute the areas of the sets Sp, Sy, 52, 53, and Sq in Figure 11.13.15. 
Answer: 
8 8\ 3 8\4 
Area of Sg = 1; area of Sy = 9 = 0.888...; area of §3 = (5) = 0.790...; area of §3 = (5) = 0.702...; area of S4 = (5) = 0.624... 


Section 10.13 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may 
also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you 
have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. 


T1. Use similitudes of the form 


to show that the Menger sponge (see Exercise 10) is the set S satisfying 
20 
s= Ur) 
for appropriately chosen similitudes 7; (for? = 1, 2, 3, ..., 20). Determine these similitudes by determining the collection of 3  ] matrices 
ay 
d; || for? = 1, 2, 3, ..., 20 
cy 


T2. Generalize the ideas involved in the Cantor set (in 1), the Sierpinski carpet (in R), and the Menger sponge (in 23) to R” by considering the set S satisfying 


My 
s= Ur) 
i= 
with 
x1 1 0... Of} %1 ay 
x2 1]}2 1 O-- 0 x2 ay 
Fi][*3]]=3]0 0 1... 01/43] +] 431 
xy 000 .. 1]/4» ayy 
where each @j; equals 0, 4, or Z, and no two of them ever equal 4 at the same time. Use a computer to construct the set 
ayy 
a2 


a3 || fori =1, 2,3, ..., my 


thereby determining the value of #» for », = 2, 3, 4. Then develop an expression for #2). 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.14 Chaos 


In this section we use a map of the unit square in the xy-plane onto itself to describe the concept of a chaotic mapping. 


Prerequisites 


Geometry of Linear Operators on 22 (Section 4.11) 
Eigenvalues and Eigenvectors 


Intuitive Understanding of Limits and Continuity 


Chaos 


The word chaos was first used in a mathematical sense in 1975 by Tien-Yien Li and James Yorke in a paper entitled “Period 
Three Implies Chaos.” The term is now used to describe the behavior of certain mathematical mappings and physical phenomena 
that at first glance seem to behave in a random or disorderly fashion but actually have an underlying element of order (examples 
include random-number generation, shuffling cards, cardiac arrhythmia, fluttering airplane wings, changes in the red spot of 
Jupiter, and deviations in the orbit of Pluto). In this section we discuss a particular chaotic mapping called Arnold's cat map, after 
the Russian mathematician Vladimir I. Arnold who first described it using a diagram of a cat. 


Arnold's Cat Map 


To describe Arnold's cat map, we need a few ideas about modular arithmetic. If x is a real number, then the notation x mod 1 
denotes the unique number in the interval [0, 1} that differs from x by an integer. For example, 

2.3mod1=0.3, O0.9mod1=09, =—3 7? mod1=0.3, 2.0mod1=0 
Note that if. is a nonnegative number, then x mod | is simply the fractional part of x. If (x, y) is an ordered pair of real numbers, 
then the notation (x, y) mod | denotes (x mod 1, y mod 1). For example, 

(2.3, =—7.9) mod 1= (0.3, 0.1) 

Observe that for every real number x, the point x mod 1 lies in the unit interval [0, 1} and that for every ordered pair (x, yy), the 
point (x, y) mod 1 lies in the unit square 


S= ((@,y) 
Also observe that the upper boundary and the right-hand boundary of the square are not included in S. 


O<x< 1, O<y< 1} 


Arnold's cat map is the transformation [- R _ R? defined by the formula 
[: (x,y) > (x + y, x + 2y) mod 1 


CD-[ eb : 


To understand the geometry of Arnold's cat map, it is helpful to write 1 in the factored form 


PDL lb be 


which expresses Arnold's cat map as the composition of a shear in the x-direction with factor 1, followed by a shear in the 
y-direction with factor 1. Because the computations are performed mod 1, [ maps all points of 22 into the unit square S. 


or, in matrix notation, 


We will illustrate the effect of Arnold's cat map on the unit square S, which is shaded in Figure 10.14.1a and contains a picture of 
a cat. It can be shown that it does not matter whether the mod 1 computations are carried out after each shear or at the very end. 
We will discuss both methods, first performing them at the end. The steps are as follows: 


Step 1 Shear in the x-direction with factor 1 (Figure 10.14.15): 
y+ a+y,y) 


SIL?" 


Step 2 Shear in the y-direction with factor 1 (Figure 10.14.1c): 
(x, ¥) = (x, Xx +y) 


f I> ]=[:+>] 


(x,y) > @, y) mod 1 


or in matrix notation 


or, in matrix notation, 


Step 3 Reassembly into S (Figure 10.14.1d): 


The geometric effect of the mod | arithmetic is to break up the parallelogram in Figure 10.14.1c and reassemble the pieces of S as 
shown in Figure 10.14.1d. 


35 T 3 rm 3c 3 
Step 1: Step 2: 
(x,y) > (x+y, y) (x, ¥) 3 OY ¥+y) 


3 Fe 
Step 3: 
(x, y) > (x. y) mod 1 


mm 
nN 
Nw 


(a) (b) (c) (d) 
Figure 10.14.1 


For computer implementation, it is more convenient to perform the mod | arithmetic at each step, rather than at the end. With this 
approach there is a reassembly at each step, but the net effect is the same. The steps are as follows: 


Step 1 Shear in the x-direction with factor 1, followed by a reassembly into S (Figure 10.14.25): 
(x,y) > (&+y, y) mod 1 

Step 2 Shear in the y-direction with factor 1, followed by a reassembly into S (Figure 10.14.2c): 
(x,y) — (4, x + y) mod 1 


Step I: Step 2: 
2 


(x, vy) 3 (x+y, y) (x, y) > (x, y) mod I A ; (x, y) > Gr, y) mod | 
— ! \/ i}-—, I Mf Wy 
| P Uy, 
2 0 1 


~~ 
nw 
nm 

~ 


(x,y) 9 Oy x+y) 


0 1 2 0 1 2 2 
(a) (b) (c) 


Figure 10.14.2 


Repeated Mappings 


Chaotic mappings such as Arnold's cat map usually arise in physical models in which an operation is performed repeatedly. For 
example, cards are mixed by repeated shuffles, paint is mixed by repeated stirs, water in a tidal basin is mixed by repeated tidal 
changes, and so forth. Thus, we are interested in examining the effect on S of repeated applications (or iterations) of Arnold's cat 
map. Figure 10.14.3, which was generated on a computer, shows the effect of 25 iterations of Arnold's cat map on the cat in the 
unit square S. Two interesting phenomena occur: 

e The cat returns to its original form at the 25th iteration. 


e At some of the intermediate iterations, the cat is decomposed into streaks that seem to have a specific direction. 


Much of the remainder of this section is devoted to explaining these phenomena. 


+101 pixels — 


di A SB 
Iteration 3 


=—101 pixels — 


aed 


Iteration 9 


Mittal 


ration 


Iteration 17 


Iteration 15 


Iteration 23 Iteration 24 


Iteration 22 


Iteration 25 


Figure 10.14.3 


Periodic Points 


Our first goal is to explain why the cat in Figure 10.14.3 returns to its original configuration at the 25th iteration. For this purpose 
it will be helpful to think of a picture in the xy-plane as an assignment of colors to the points in the plane. For pictures generated 
on a computer screen or other digital device, hardware limitations require that a picture be broken up into discrete squares, called 
pixels. For example, in the computer-generated pictures in Figure 10.14.3 the unit square S is divided into a grid with 101 pixels 
on a side for a total of 10,201 pixels, each of which is black or white (Figure 10.14.4). An assignment of colors to pixels to create 


a picture is called a pixel map. 


Enlarged view of cat's face 


showing individual pixels 


Figure 10.14.4 


As shown in Figure 10.14.5, each pixel in S can be assigned a unique pair of coordinates of the form (2 / 101, » / 101) that 
identifies its lower left-hand corner, where m and 7 are integers in the range 0, 1, 2, ..., 100. We call these points pixel points 
because each such point identifies a unique pixel. Instead of restricting the discussion to the case where S is subdivided into an 
array with 101 pixels on a side, let us consider the more general case where there are p pixels per side. Thus, each pixel map in S 
consists of p? pixels uniformly spaced 1 / » units apart in both the x- and the y-directions. The pixel points in S have coordinates 


of the form (7 / p, 2 / p), where m and n are integers ranging from 0 to p — 1. 


lez|-2)-3| 
» fs 


10 7 A 
101 ¢ 3 . m 100 


Tor 101 101 101 101 101 


Figure 10.14.5 


Under Arnold's cat map each pixel point of S is transformed into another pixel point of S. To see why this is so, observe that the 
image of the pixel point (+ / p, x / p) under [ is given in matrix form by 


m m MR 
P 11)? P 

cr a -|; | ” mod 1= jaestiy mod 1 (2) 
P P Pp 


The ordered pair ( (v2 +) / p, (+ 2m) / p) mod 1 is of the form (2! / p, x’ / p), where 2" and y! lie in the range 
0, 1, 2,..., p — 1. Specifically, »2' and »2' are the remainders when 3; 4- » and #2 +- 2» are divided by p, respectively. 
Consequently, each point in S' of the form (#2 / p, x / p) is mapped onto another point of the same form. 


Because Arnold's cat map transforms every pixel point of S into another pixel point of S, and because there are only p? different 


pixel points in S, it follows that any given pixel point must return to its original position after at most p? iterations of Arnold's cat 


map. 
EXAMPLE 1 UsingFormula2 <4 


If p = 76, then 2 becomes 


76 76 
cr = mod 1 
2 m+ 2n 
76 76 
. 58 
In this case the successive iterates of the point 76° 76 are 

0 | 2 3 4 5 6 7 8 
27 2 oO 67 49 4 39 37 72 
76 5 76 - 76 = 76 = 76 _ 76 =. 76 - 76 = 76 
58 67 67 58 31 35 4 35 31 
76 76 76 76 76 76 76 76 76 


(verify). Because the point returns to its initial position on the ninth application of Arnold's cat map (but no sooner), 
the point is said to have period 9, and the set of nine distinct iterates of the point is called a 9-cycle. Figure 10.14.6 
shows this 9-cycle with the initial point labeled 0 and its successive iterates labeled accordingly. 


Fx 
LK 


Figure 10.14.6 


In general, a point that returns to its initial position after n applications of Arnold's cat map, but does not return with fewer than n 
applications, is said to have period n, and its set of n distinct iterates is called an n-cycle. Arnold's cat map maps (0, 0) into 

(0, 0), so this point has period 1. Points with period | are also called fixed points. We leave it as an exercise (Exercise 11) to 
show that (0, 0} is the only fixed point of Arnold's cat map. 


Period Versus Pixel Width 


If P; and 3 are points with periods ¢1 and #2, respectively, then P; returns to its initial position in ¢1 iterations (but no sooner), 
and 3 returns to its initial position in ¥2 iterations (but no sooner); thus, both points return to their initial positions in any number 
of iterations that is a multiple of both ¢1 and ¢2. In general, for a pixel map with py? pixel points of the form (7 ! p, x! p), we let 


II(p)} denote the least common multiple of the periods of all the pixel points in the map [i.e., II(p)} is the smallest integer that is 
divisible by all of the periods]. It follows that the pixel map will return to its initial configuration in I1(p) iterations of Arnold's 
cat map (but no sooner). For this reason, we call [I(p) the period of the pixel map. In Exercise 4 we ask you to show that if 

p = 101, then all pixel points have period 1, 5, or 25, so I1(101} = 25. This explains why the cat in Figure 10.14.3 returned to 
its initial configuration in 25 iterations. 


Figure 10.14.7 shows how the period of a pixel map varies with p. Although the general tendency is for the period to increase as p 
increases, there is a surprising amount of irregularity in the graph. Indeed, there is no simple function that specifies this 
relationship (see Exercise 1). 


900 


TI(p) (Period) 
w 
= 


0 30 100 150 200 250 300 350 400 450 300 
p (Side length of unit square in pixels) 


Figure 10.14.7 


Although a pixel map with p pixels on a side does not return to its initial configuration until II{p) iterations have occurred, 
various unexpected things can occur at intermediate iterations. For example, Figure 10.14.8 shows a pixel map with p = 250 of 
the famous Hungarian-American mathematician John von Neumann. It can be shown that [1(250) = 750; hence, the pixel map 
will return to its initial configuration after 750 iterations of Arnold's cat map (but no sooner). However, after 375 iterations the 
pixel map is turned upside down, and after another 375 iterations (for a total of 750) the pixel map is returned to its initial 
configuration. Moreover, there are so many pixel points with periods that divide 750 that multiple ghostlike images of the original 
likeness occur at intermediate iterations; at 195 iterations numerous miniatures of the original likeness occur in diagonal rows. 


———._ 250 pixels ———> 


iat 


Eye 


ne 
ot: 


bes 
ie 


mm —. 250 pixels 
mi 
he 


75 iterations 


af vs 


125 iterations 195 iterations 250 iterations 375 iterations 


Figure 10.14.8 


The Tiled Plane 


Our next objective is to explain the cause of the linear streaks that occur in Figure 10.14.3. For this purpose it will be helpful to 
view Arnold's cat map another way. As defined, Arnold's cat map is not a linear transformation because of the mod 1 arithmetic. 
However, there is an alternative way of defining Arnold's cat map that avoids the mod | arithmetic and results in a linear 
transformation. For this purpose, imagine that the unit square S' with its picture of the cat is a “tile,” and suppose that the entire 
plane is covered with such tiles, as in Figure 10.14.9. We say that the xy-plane has been filed with the unit square. If we apply the 
matrix transformation in | to the entire tiled plane without performing the mod | arithmetic, then it can be shown that the portion 
of the image within S will be identical to the image that we obtained using the mod | arithmetic (Figure 10.14.9). In short, the 
tiling results in the same pixel map in S as the mod | arithmetic, but in the tiled case Arnold's cat map is a linear transformation. 


Step 1: Step 2: Step 3: 
(x.y) > &+y,y) (xy, y) > (x x+y) (x, y) 3 (, y) mod I 


ww 79 yf) 


i, Yy 
wy 3 UA 


0 1 2 0 I 2 0 


2 0 1 2 


Figure 10.14.9 


It is important to understand, however, that tiling and mod | arithmetic produce periodicity in different ways. If a pixel map in S 
has period a, then in the case of mod | arithmetic, each point returns to its original position at the end of iterations. In the case 
of tiling, points need not return to their original positions; rather, each point is replaced by a point of the same color at the end of n 
iterations. 


Properties of Arnold's Cat Map 


To understand the cause of the streaks in Figure 10.14.3, think of Arnold's cat map as a linear transformation on the tiled plane. 
Observe that the matrix 
11 
C= 


that defines Arnold's cat map is symmetric and has a determinant of 1. The fact that the determinant is 1 means that multiplication 
by this matrix preserves areas; that is, the area of any figure in the plane and the area of its image are the same. This is also true 
for figures in S in the case of mod | arithmetic, since the effect of the mod | arithmetic is to cut up the figure and reassemble the 
pieces without any overlap, as shown in Figure 10.14.1d. Thus, in Figure 10.14.3 the area of the cat (whatever it is) is the same as 
the total area of the blotches in each iteration. 


The fact that the matrix is symmetric means that its eigenvalues are real and the corresponding eigenvectors are perpendicular. We 
leave it for you to show that the eigenvalues and corresponding eigenvectors of C are 


ae 5 > 26180... y= 3515 03819. 


1 Hie 
1 _ rly eee 


ia Pe aad = le ae ; 1 
2 1 


For each application of Arnold's cat map, the eigenvalue A, causes a stretching in the direction of the eigenvector V1 by a factor 
of 2.6180..., and the eigenvalue Az causes a compression in the direction of the eigenvector ¥2 by a factor of 0.3819... Figure 
10.14.10 shows a square centered at the origin whose sides are parallel to the two eigenvector directions. Under the above 
mapping, this square is deformed into the rectangle whose sides are also parallel to the two eigenvector directions. The area of the 


square and rectangle are the same. 


Figure 10.14.10 


To explain the cause of the streaks in Figure 10.14.3, consider S to be part of the tiled plane, and let p be a point of S with period 

n. Because we are considering tiling, there is a point q in the plane with the same color as p that on successive iterations moves 

toward the position initially occupied by p, reaching that position on the nth iteration. This point is gq = (4 ty p=A "p. since 
A"q=A"(A™"p) =p 

Thus, with successive iterations, points of S flow away from their initial positions, while at the same time other points in the plane 

(with corresponding colors) flow toward those initial positions, completing their trip on the final iteration of the cycle. Figure 


10.14.11 illustrates this in the case where » = 4,q= | — Ey > , and p= A‘g = 1 2 . Note that 
Ca ai 373 
pmod1l=qmod1= & al so both points occupy the same positions on their respective tiles. The outgoing point moves in 


the general direction of the eigenvector ¥1, as indicated by the arrows in Figure 10.14.11, and the incoming point moves in the 
general direction of eigenvector V3. It is the “flow lines” in the general directions of the eigenvectors that form the streaks in 
Figure 10.14.3. 


4 2 0 2 4 


Figure 10.14.11 


Nonperiodic Points 


Thus far we have considered the effect of Arnold's cat map on pixel points of the form (#2 ! », » / p) for an arbitrary positive 
integer p. We know that all such points are periodic. We now consider the effect of Arnold's cat map on an arbitrary point (a, 2) 
in S. We classify such points as rational if the coordinates a and b are both rational numbers, and irrational if at least one of the 
coordinates is irrational. Every rational point is periodic, since it is a pixel point for a suitable choice of p. For example, the 
rational point (7 / 51, 72/63) can be written as (7159 / 5152, 7251 / 5152), So it is a pixel point with P = $152. It can be shown 
(Exercise 13) that the converse is also true: Every periodic point must be a rational point. 


It follows from the preceding discussion that the irrational points in S are nonperiodic, so that successive iterates of an irrational 
point (x9, yg) in S must all be distinct points in S. Figure 10.14.12, which was computer generated, shows an irrational point and 
selected iterates up to 100,000. For the particular irrational point that we selected, the iterates do not seem to cluster in any 
particular region of S; rather, they appear to be spread throughout S, becoming denser with successive iterations. 


Initial point 


1000 iterations 2000 iterations 


PITT 


stort 


— 


10,000 iterations 25,000 iterations 0 iterations 


Figure 10.14.12 


The behavior of the iterates in Figure 10.14.12 is sufficiently important that there is some terminology associated with it. We say 
that a set D of points in S is dense in S if every circle centered at any point of S encloses points of D, no matter how small the 
radius of the circle is taken (Figure 10.14.13). It can be shown that the rational points are dense in S and the iterates of most (but 
not all) of the irrational points are dense in S. 


Arbitrary circle in $ 


Points of set D 


Figure 10.14.13 


Definition of Chaos 


We know that under Arnold's cat map, the rational points of S are periodic and dense in S and that some but not all of the 
irrational points have iterates that are dense in S. These are the basic ingredients of chaos. There are several definitions of chaos in 


current use, but the following one, which is an outgrowth of a definition introduced by Robert L. Devaney in 1986 in his book An 
Introduction to Chaotic Dynamical Systems (Benjamin/Cummings Publishing Company), is most closely related to our work. 


DEFINITION 1 


A mapping T of S onto itself is said to be chaotic if: 
(i) S contains a dense set of periodic points of the mapping T. 


(ii) There is a point in S whose iterates under T are dense in S. 


Thus Arnold's cat map satisfies the definition of a chaotic mapping. What is noteworthy about this definition is that a chaotic 
mapping exhibits an element of order and an element of disorder—the periodic points move regularly in cycles, but the points 
with dense iterates move irregularly, often obscuring the regularity of the periodic points. This fusion of order and disorder 
characterizes chaotic mappings. 


Dynamical Systems 


Chaotic mappings arise in the study of dynamical systems. Informally stated, a dynamical system can be viewed as a system that 
has a specific state or configuration at each point of time but that changes its state with time. Chemical systems, ecological 
systems, electrical systems, biological systems, economic systems, and so forth can be looked at in this way. In a discrete-time 
dynamical system, the state changes at discrete points of time rather than at each instant. In a discrete-time chaotic dynamical 
system, each state results from a chaotic mapping of the preceding state. For example, if one imagines that Arnold's cat map is 
applied at discrete points of time, then the pixel maps in Figure 10.14.3 can be viewed as the evolution of a discrete-time chaotic 
dynamical system from some initial set of states (each point of the cat is a single initial state) to successive sets of states. 


One of the fundamental problems in the study of dynamical systems is to predict future states of the system from a known initial 
state. In practice, however, the exact initial state is rarely known because of errors in the devices used to measure the initial state. 
It was believed at one time that if the measuring devices were sufficiently accurate and the computers used to perform the 
iteration were sufficiently powerful, then one could predict the future states of the system to any degree of accuracy. But the 
discovery of chaotic systems shattered this belief because it was found that for such systems the slightest error in measuring the 
initial state or in the computation of the iterates becomes magnified exponentially, thereby preventing an accurate prediction of 
future states. Let us demonstrate this sensitivity to initial conditions with Arnold's cat map. 


Suppose that Pg is a point in the xy-plane whose exact coordinates are (0.77837, 0.70904). A measurement error of 0.00001 is 
made in the y-coordinate, such that the point is thought to be located at (0.77837, 0.70905), which we denote by Qg. Both Py 
and (2p are pixel points with » = 100, 000 (why?), and thus, since I1(100, 000) = 75, 000, both return to their initial positions 
after 75,000 iterations. In Figure 10.14.14 we show the first 50 iterates of Pg under Arnold's cat map as crosses and the first 50 
iterates of Og as circles. Although Py and @p are close enough that their symbols overlap initially, only their first eight iterates 
have overlapping symbols; from the ninth iteration on their iterates follow divergent paths. 


Figure 10.14.14 


It is possible to quantify the growth of the error from the eigenvalues and eigenvectors of Arnold's cat map. For this purpose we 
will think of Arnold's cat map as a linear transformation on the tiled plane. Recall from Figure 10.14.10 and the related discussion 
that the projected distance between two points in S in the direction of the eigenvector V1 increases by a factor of 2.6180... = Ay) 
with each iteration (Figure 10.14.15). After nine iterations this projected distance increases by a factor of 

(2.6180.. Bg = 5777.99._., and with an initial error of roughly 1 / 100, 000 in the direction of ¥1, this distance is 0.05777..., or 


about WF the width of the unit square S. After 12 iterations this small initial error grows to (2.6180...) 12 +100, 000 = 1.0368..., 


which is greater than the width of S. Thus, we completely lose track of the true iterates within S after 12 iterations because of the 
exponential growth of the initial error. 


Figure 10.14.15 


Although sensitivity to initial conditions limits the ability to predict the future evolution of dynamical systems, new techniques 
are presently being investigated to describe this future evolution in alternative ways. 


Exercise Set 10.14 


1. Ina journal article [F. J. Dyson and H. Falk, “Period of a Discrete Cat Mapping,” The American Mathematical Monthly, 99 
(August-September 1992), pp. 603-614] the following results concerning the nature of the function [I(~) were established: 


(i) T1(p) =3p if and only if p =2-5* fork =1, 2... 

(ii) [I(p) = 2p if and only if p = 5* fork = 1, 2,...o0r p =6 + 5* fork =0, 1, 2,... 
(iii) TI{p)}) < 12p / 7? for all other choices of p. 

Find [1(250), 11(25), 11(125), 11(30), 11(10), M1(50), 11(3750), T1(6), and M(5). 


Answer: 


11(250) = 750, M1(25) = 50, (125) = 250, (30) = 60, M1(10) = 30, M(50) = 150, 11(3750) = 7500, M(6) = 12, 
T1(5) = 10 


nN 


. Find all the n-cycles that are subsets of the 36 points in S of the form (#2 / 6, x / 6) with m and n in the range 0, 1, 2, 3, 4, 5. 
Then find I1(6). 


Answer: 


; ; J {2 33 2\L. ‘ ipf4 4 4\ f2 22 
One 1-cycle: {(0, 0}} ; one 3-cycle: (2-9) (F. ar (0, 2) {104 cycles: {6-9} (F a (2. 0}, (2. 2) and 


2\ /f24 4\ f4 2\\. 
(0 6} . 3} (0 aD (F ; two 12-cycles: 


(0.5) (6 8} (é 8} 4) (8) (8) (8) (8) (Ge) (& 8} (8) (E8)} 


(6°) (& &) (6-6) (6:8) (3) (6-6) (6°) (&- 8) (6-6) (5) (& 8) (6-8) mO= "2 


3. (Fibonacci Shift-Register Random-Number Generator) A well-known method of generating a sequence of “pseudorandom” 


integers xg, X1, X32, X3, ...in the interval from 0 to p — 1 is based on the following algorithm: 

(i) Pick any two integers Xg and x1 from the range 0, 1, 2,.... p— 1. 

(ii) Set xy41 = (%y + Xy,~1) mod p for x = 1, 2, .... 

Here x mod p denotes the number in the interval from 0 to » — 1 that differs from x by a multiple of p. For example, 35 mod 
9 = 8 (because 8 = 35 — 3 - 9); 36 mod 9 = 0) (because () = 36 — 4 - 9); and —3 mod 9 = 6 (because 6 = —3 4-1-9). 


(a) Generate the sequence of pseudorandom numbers that results from the choices p = 15, xg = 3, and xj = 7 until the 
sequence starts repeating. 


(b) Show that the following formula is equivalent to step (ii) of the algorithm: 


*n+1 1 1) [%n-1 
eae || Xn Joa e for 2 = 1, 2, 3, ... 


(c) Use the formula in part (b) to generate the sequence of vectors for the choices p = 21, xg = 5, and x; = 5 until the 
sequence starts repeating. 


Answer: 


(a) 3,7, 10, 2, 12, 14, 11, 10, 6, 1, 7, 8, 0, 8, 8, 1, 9, 10, 4, 14, 3, 2,5, 7, 12, 4, 1, 5, 6, 11, 2, 13, 0, 13, 13, 11, 9, 5, 14, 4, 3, 7, 
(c) (5, 5), (10, 15), (4, 19), (2, 0), (2; 2), (4, 6), (10, 16), (5, 0), (5, 5),--- 


Remark If we take p = 1 and pick xg and x1 from the interval [0, 1), then the above random-number generator produces 
pseudorandom numbers in the interval [0, 1}. The resulting scheme is precisely Arnold's ct map. Furthermore, if we eliminate 
the modular arithmetic in the algorithm and take xq = x1 = 1, then the resulting sequence of integers is the famous Fibonacci 


sequence, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, .. 


., In which each number after the first two is the sum of the preceding two 


numbers. 


4 


iva) 


6. 


1 
‘For C= 
no=[' 


>} it can be verified that 


25 7, 778, 742,049 12, 586, 269, 025 
~ | 12, 586, 269,025 20, 365, 011, 074 


It can also be verified that 12,586,269,025 is divisible by 101 and that when 7,778,742,049 and 20,365,011,074 are divided by 
101, the remainder is 1. 


(a) Show that every point in S of the form (#2 / 101, x / 101) returns to its starting position after 25 iterations under Arnold's 
cat map. 


(b) Show that every point in S of the form (#2 / 101, 2 / 101) has period 1, 5, or 25. 


(C) Show that the point or 0) has period greater than 5 by iterating it five times. 


(d) Show that I1(101) = 25. 


Answer: 

(c) = 1 —_ 2 3 5 8 \f13 21 (34 55 | 
The first five iterates of (=>, 0) are (= io1 tor’ tor } (tor tor aot tor} #84 Gor” For | 

* Show that for the mapping 7: § —, § defined by T(x, y) = |x + =, ) mod 1, every point in S is a periodic point. Why does 


this show that the mapping is not chaotic? 


An Anosov automorphism on R2 is a mapping from the unit square S onto S of the form 


PE app 


in which (i) a, b, c, and d are integers, (ii) the determinant of the matrix is +. 1, and (iii) the eigenvalues of the matrix do not 
have magnitude 1. It can be shown that all Anosov automorphisms are chaotic mappings. 


(a) Show that Arnold's cat map is an Anosov automorphism. 


(b) Which of the following are the matrices of an Anosov automorphism? 
0 1 BZ 1 0 
1 o/7 117 jo 17 
5 7 6 2 
23) |5 2 
(c) Show that the following mapping of S onto S is not an Anosov automorphism. 


LL SB 


What is the geometric effect of this transformation on S? Use your observation to show that the mapping is not a chaotic 
mapping by showing that all points in S are periodic points. 


Answer: 


(b) The matrices of Anosov automorphisms are ki | and E i 


(c) The transformation affects a rotation of S through 99° in the clockwise direction. 


7. Show that Arnold's cat map is one-to-one over the unit square S and that its range is S. 
8. Show that the inverse of Arnold's cat map is given by 
C(x, y) = (2x -y, —x-+y) mod 1 


9. Show that the unit square S can be partitioned into four triangular regions on each of which Arnold's cat map is a 


transformation of the form 
x] [1 1/[% a 
YI ly oly + b 


where a and b need not be the same for each region. [Hint: Find the regions in S' that map onto the four shaded regions of the 
parallelogram in Figure 10.14.1d.] 


Answer: 


(0, 1) (a, w (OL) (l/2,1)) C1) 


iS [EEE 


(0, 0) (1,0) (0.0) (1/2,0) (1,0) 


I corte |e eh odes ion te |? | = Oo}. aie | oe ion tv: |7] =| 7! 
nregion I: | | = 0 sin region II: | , |= 1 sin region IIT: | | = 4 sin region IV: | | = 5 
10. If (xo, Yo) is a point in S and (Xn. Yn) is its mth iterate under Arnold's cat map, show that 
xn 1 1)"[%0 
kal = |; | [ojo : 
This result implies that the modular arithmetic need only be performed once rather than after each iteration. 


11. Show that (0, 0) is the only fixed point of Arnold's cat map by showing that the only solution of the equation 


Pe]=[1 llr fe 


12. 


13. 


14. 


with 0 < xg < 1 and Q < yg < 118 xg = yg = 0. [Hint: For appropriate nonnegative integers, r and s, we can write 
*O}_} 1 14) *0]_ [:| 
Yo| [1 2]jxo} Ls 

Find all 2-cycles of Arnold's cat map by finding all solutions of the equation 


Pol= [i 2] Dofeee 


with 0 < xg < 1 and Q < yg < 1. [Hint: For appropriate nonnegative integers, r and s, we can write 


pol [3 slloo]-[2] 


for the preceding equation. ] 


for the preceding equation. ] 


Answer: 
13 42 ; 21 cs : 
( 2 and (5 ‘ Z| form one 2-cycle, and (2 : 5 | and (3 3 form another 2-cycle. 


Show that every periodic point of Arnold's cat map must be a rational point by showing that for all solutions of the equation 


xo 1 1]"[%0 
50 7 |; | Lo fod ‘ 
the numbers xg and yg are quotients of integers. 


Let T be the Arnold's cat map applied five times in a row; that is, 7 —[->. Figure Ex-14 represents four successive mappings 


of T on the first image, each image having a resolution of 101 x 101 pixels. The fifth mapping returns to the first image 
because this cat map has a period of 25. Explain how you might generate this particular sequence of images. 


Figure Ex-14 


Answer: 


Begin with a 101 x 101 array of white pixels and add the letter ‘A’ in black pixels to it. Apply the mapping to this image, 
which will scatter the black pixels throughout the image. Then superimpose the letter ‘B’ in black pixels onto this image. 
Apply the mapping again and then superimpose the letter ‘C’ in black pixels onto the resulting image. Repeat this procedure 
with the letters ‘D’ and ‘E’. The next application of the mapping will return you to the letter ‘A’ with the pixels for the letters 
‘B’ through ‘E’ scattered in the background. 


Section 10.14 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, 
Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some 
linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are 
using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have 
mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the 
regular exercise sets. 


T1. The methods of Exercise 4 show that for the cat map, [I(p) is the smallest integer satisfying the equation 


This suggests that one way to determine [I{p) is to compute 


starting with », — ] and stopping when this produces the identity matrix. Use this idea to compute [I(p») for p = 2, 3, ..., 10. 
Compare your results to the formulas given in Exercise 1, if they apply. What can you conjecture about 


1 
F LP no ts 
1 2 
when [I(p)} is even? 


T2. The eigenvalues and eigenvectors for the cat map matrix 
bel 
C= 


y= tS X 3=y5_ 


J i? 
1 1 


v= 1+ 95 > V2=!/ 1-5 
2 2 


Using these eigenvalues and eigenvectors, we can define 


34 5 ; 


are 


1 1 
2 
D= and P=] } l= 
ae 14/5 1-8 
0 2 2 
2 
and write ¢* — Ppp; hence, ¢*" — Pp" P—!. Use a computer to show that 
* 
n_|*ll ©12 
) 
oy) eae 
where 
so) _ {14 y5 \3-y5 " {1-95 \[34+/5\" 
11 ays 9 2y'5 > 
wo (14 y5 \f3 4 5 " {1-75 \i3-y5\" 
22 2y'5 D avs 9 
and 


ples] 


12 =c2 = {5 3 


How can you use these results and your conclusions in Exercise T1 to simplify the method for computing [I(p)}? 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.15 Cryptography 


In this section we present a method of encoding and decoding messages. We also examine modular arithmetic and show 
how Gaussian elimination can sometimes be used to break an opponent's code. 


Prerequisites 


Matrices 

Gaussian Elimination 
Matrix Operations 
Linear Independence 


Linear Transformations (Section 4.9) 


Ciphers 


The study of encoding and decoding secret messages is called cryptography. Although secret codes date to the earliest days 
of written communication, there has been a recent surge of interest in the subject because of the need to maintain the 
privacy of information transmitted over public lines of communication. In the language of cryptography, codes are called 
ciphers, uncoded messages are called plaintext, and coded messages are called ciphertext. The process of converting from 
plaintext to ciphertext is called enciphering, and the reverse process of converting from ciphertext to plaintext is called 
deciphering. 


The simplest ciphers, called substitution ciphers, are those that replace each letter of the alphabet by a different letter. For 
example, in the substitution cipher 


Plan A BCOD EF GHiI JK LMN OPQRS TUV WX Y Z 
Cipher D E F GHI JK LMN OP QRS TUVW XK Y ZA BC 


the plaintext letter A is replaced by D, the plaintext letter B by E, and so forth. With this cipher the plaintext message 
ROME WAS NOT BUILT IN A DAY 
becomes 


URPH ZDV ORW EX LOW LO D GDB 


Hill Ciphers 


A disadvantage of substitution ciphers is that they preserve the frequencies of individual letters, making it relatively easy to 
break the code by statistical methods. One way to overcome this problem is to divide the plaintext into groups of letters and 
encipher the plaintext group by group, rather than one letter at a time. A system of cryptography in which the plaintext is 
divided into sets of n letters, each of which is replaced by a set of n cipher letters, is called a polygraphic system. In this 
section we will study a class of polygraphic systems based on matrix transformations. [The ciphers that we will discuss are 
called Hill ciphers after Lester S. Hill, who introduced them in two papers: “Cryptography in an Algebraic Alphabet,” 
American Mathematical Monthly, 36 (Jane—July 1929), pp. 306-312; and “Concerning Certain Linear Transformation 
Apparatus of Cryptography,” American Mathematical Monthly, 38 (March 1931), pp. 135—154.] 


In the discussion to follow, we assume that each plaintext and ciphertext letter except Z is assigned the numerical value that 
specifies its position in the standard alphabet (Table 1). For reasons that will become clear later, Z is assigned a value of 
zero. 


Table 1 


A BC DEF GHIJIK LM NOPQRS TUVWX Y Z@ 


123 4 5 6 7 8 9 10 11 12 13 1415 16 17 18 19 20 21 22 23 24 25 0 


In the simplest Hill ciphers, successive pairs of plaintext are transformed into ciphertext by the following procedure: 


ay. 212 
A= 
E a 


to perform the encoding. Certain additional conditions on A will be imposed later. 


Step 1 Choose a 2 x 2 matrix with integer entries 


Step 2 Group successive plaintext letters into pairs, adding an arbitrary “dummy” letter to fill out the last pair if the 
plaintext has an odd number of letters, and replace each plaintext letter by its numerical value. 


Step 3 Successively convert each plaintext pair ?1?2 into a column vector 


me 
P™| 2 
and form the product Ap. We will call p a plaintext vector and Ap the corresponding ciphertext vector. 


Step 4 Convert each ciphertext vector into its alphabetic equivalent. 


EXAMPLE 1 Hill CipherofaMessage << 


1 2 
0 3 
to obtain the Hill cipher for the plaintext message 
iAM HIDING 


Use the matrix 


Solution If we group the plaintext into pairs and add the dummy letter G to fill out the last pair, we obtain 
fA MH iD IN GG 
or, equivalently, from Table 1, 
91 138 94 914 77 


To encipher the pair JA, we form the matrix product 
1 2}]/9} | 11 
0 3]}1 3 
which, from Table 1, yields the ciphertext KC. 
To encipher the pair MH, we form the product 
1 2]/13]_ | 29 
0 3} 8 24 


However, there is a problem here, because the number 29 has no alphabet equivalent (Table 1). To resolve 
this problem, we make the following agreement: 


Whenever an integer greater than 25 occurs, it will be 
replaced by the remainder that results when this 
integer is divided by 26 . 


Because the remainder after division by 26 is one of the integers 0, 1, 2, ..., 25, this procedure will always 
yield an integer with an alphabet equivalent. 


Thus, in 1 we replace 29 by 3, which is the remainder after dividing 29 by 26. It now follows from Table | 
that the ciphertext for the pair MH is CX. 


The computations for the remaining ciphertext vectors are 


BIE 
JL) - «(a 
S16) - 


These correspond to the ciphertext pairs OL, KP, and UU, respectively. In summary, the entire ciphertext 
message is 


I 
a | 
— et 
~~] Nm ~~) 
| 


KC CX OL KP UU 


which would usually be transmitted as a single string without spaces: 


KCCX QEKPUU 


Because the plaintext was grouped in pairs and enciphered by a 2 x 2 matrix, the Hill cipher in Example | is referred to as a 
Hill 2-cipher. It is obviously also possible to group the plaintext in triples and encipher by a 3 x 3 matrix with integer 
entries; this is called a Hill 3-cipher. In general, for a Hill n-cipher, plaintext is grouped into sets of n letters and 
enciphered by an 9 5 », matrix with integer entries. 


Modular Arithmetic 


In Example 1, integers greater than 25 were replaced by their remainders after division by 26. This technique of working 
with remainders is at the core of a body of mathematics called modular arithmetic. Because of its importance in 
cryptography, we will digress for a moment to touch on some of the main ideas in this area. 


In modular arithmetic we are given a positive integer m, called the modulus, and any two integers whose difference is an 
integer multiple of the modulus are regarded as “equal” or “equivalent” with respect to the modulus. More precisely, we 
make the following definition. 


DEFINITION 1 


If m is a positive integer and a and 5 are any integers, then we say that a is equivalent to b modulo m, written 
a= (mods) 


if g — } is an integer multiple of m. 


EXAMPLE 2 Various Equivalences 


7 = 2 = (mod 5) 
19 = 3. (mod 2) 
25 (mod 26) 
O (mod 4) 


— | 
Noe 
| Il 


For any modulus m it can be proved that every integer a is equivalent, modulo m, to exactly one of the integers 
0,1,2,...,#—1 
We call this integer the residue of a modulo m, and we write 
Zm = {0, 1, 2, ..., 72 — 1} 
to denote the set of residues modulo m. 


If a is a nonnegative integer, then its residue modulo m is simply the remainder that results when a is divided by m. For an 
arbitrary integer a, the residue can be found using the following theorem. 


THEOREM 10.15.1 


For any integer a and modulus m, let 
R=remainder of ae 
Then the residue r of a modulo m is given by 
R ifa>o 
r=im—-R ifa=<0 and R#0 
0 ifa=<0 and R=0 


EXAMPLE 3 Residuesmod 26 << 
Find the residue modulo 26 of (a) 87, (b) —38, and (c) —26. 


Solution 
(a) Dividing |87| = 87 by 26 yields a remainder of 8 = 9, so ry = 9. Thus, 
87=9 (mod 26) 
(b) Dividing | — 38] = 38 by 26 yields a remainder of R = 12, so r = 26 — 12 = 14. Thus, 
=38=14 (mod 26) 
(c) Dividing | — 26| = 26 by 26 yields a remainder of R = 0. Thus, 
=26=0 (mod 26) 


In ordinary arithmetic every nonzero number a has a reciprocal or multiplicative inverse, denoted by g~!, such that 


1 


aa =g7 


a=1 
In modular arithmetic we have the following corresponding concept: 


DEFINITION 2 


If a is anumber in Z,,, then a number g lin Zyp is called a reciprocal or multiplicative inverse of a modulo m if 


aa! =aa= 1 (mod #2). 


It can be proved that if a and m have no common prime factors, then a has a unique reciprocal modulo m; conversely, if a 
and m have a common prime factor, then a has no reciprocal modulo m. 


EXAMPLE 4 Reciprocalof3mod26 << 


The number 3 has a reciprocal modulo 26 because 3 and 26 have no common prime factors. This reciprocal 
can be obtained by finding the number x in 2¢ that satisfies the modular equation 


3x=1 (mod 26) 


Although there are general methods for solving such modular equations, it would take us too far afield to 
study them. However, because 26 is relatively small, this equation can be solved by trying the possible 
solutions, 0 to 25, one at a time. With this approach we find that + — 9 is the solution, because 


3-9=27=1 (mod 26) 
Thus, 
319 (mod 26) 


EXAMPLE 5 ANumber with No Reciprocal mod 26 << 


The number 4 has no reciprocal modulo 26, because 4 and 26 have 2 as a common prime factor (see Exercise 
8). 


For future reference, in Table 2 we provide the following reciprocals modulo 26: 


Table 2 Reciprocals Modulo 26 


Deciphering 


Every useful cipher must have a procedure for decipherment. In the case of a Hill cipher, decipherment uses the inverse 
(mod 26) of the enciphering matrix. To be precise, if m is a positive integer, then a square matrix A with entries in Z,, is 
said to be invertible modulo m if there is a matrix B with entries in Z,, such that 


AB=BA=J] (mod) 


@11 &12 
A= 
i | 
is invertible modulo 26 and this matrix is used in a Hill 2-cipher. If 


P1 
»=| >| (1) 


c=Ap (mod 26) 


Suppose now that 


is a plaintext vector, then 


is the corresponding ciphertext vector and 
p=Ae (mod 26) 
Thus, each plaintext vector can be recovered from the corresponding ciphertext vector by multiplying it on the left by 
—l 
A (mod 26). 


In cryptography it is important to know which matrices are invertible modulo 26 and how to obtain their inverses. We now 
investigate these questions. 


In ordinary arithmetic, a square matrix A is invertible if and only if det(_A} # 0, or, equivalently, if and only if det(_A) has a 
reciprocal. The following theorem is the analog of this result in modular arithmetic. 


THEOREM 10.15.2 


A square matrix A with entries in Z,, is invertible modulo m if and only if the residue of det(.4) modulo m has a 
reciprocal modulo m. 


Because the residue of det(_.4) modulo m will have a reciprocal modulo m if and only if this residue and m have no common 
prime factors, we have the following corollary. 


COROLLARY 10.15.3 


A square matrix A with entries in Z,, is invertible modulo m if and only if m and the residue of det(.4) modulo m 
have no common prime factors. 


Because the only prime factors of j9; = 26 are 2 and 13, we have the following corollary, which is useful in cryptography. 


COROLLARY 10.15.4 


A square matrix A with entries in 27g is invertible modulo 26 if and only if the residue of det(.4) modulo 26 is not 
divisible by 2 or 13. 


We leave it for you to verify that if 


[] 


has entries in 73g and the residue of det(.4) = ad — be modulo 26 is not divisible by 2 or 13, then the inverse of A (mod 
26) is given by 


Aa! = (ad —bc) = d >| (mod 26) 
—c a 
where (ad — bc) —l is the reciprocal of the residue of g7 — be (mod 26). 


EXAMPLE 6 Inverse of a Matrix mod 26 <@ 


Find the inverse of 


modulo 26. 


Solution 
det(A) =ad — be =5-3-6-2=3 
so from Table 2, 


(ad —bce) }=31=9 (mod 26) 


4) 3 =6)-| BF =54) [1 2s 
. =3| 3 s|=[_12 373 a ta 


4 _[5 6][1 24]_ [53 234]_[1 0 
AAT = = = d 26 
E sla “| ee te E i eee) 


Thus, from 2, 


As a check, 


Similarly, 4-14 =}. 


EXAMPLE 7 Decoding aHill2-Cipher << 


Decode the following Hill 2-cipher, which was enciphered by the matrix in Example 6: 
GTNEGKEDUSK 


Solution From Table 1 the numerical equivalent of this ciphertext is 
720 1411 711 421 #1911 


To obtain the plaintext pairs, we multiply each ciphertext vector by the inverse of A (obtained in Example 6): 


(2) 


: eee ~ [43617 [20 | (mod 26) 
fe tall] = [32]=['3] 929 
Sa) = EE]-[] 9 
: olla ~ ar (=| 15] (mod 26) 
E oli = 3e1|= [231 (mod 26) 


From Table 1, the alphabet equivalents of these vectors are 
ST Ri KE NO WW 
which yields the message 


STRIKE NOW 


Breaking a Hill Cipher 


Because the purpose of enciphering messages and information is to prevent “opponents” from learning their contents, 
cryptographers are concerned with the security of their ciphers—that is, how readily they can be broken (deciphered by 
their opponents). We will conclude this section by discussing one technique for breaking Hill ciphers. 


Suppose that you are able to obtain some corresponding plaintext and ciphertext from an opponent's message. For example, 
on examining some intercepted ciphertext, you may be able to deduce that the message is a letter that begins DEAR SIR. We 
will show that with a small amount of such data, it may be possible to determine the deciphering matrix of a Hill code and 
consequently obtain access to the rest of the message. 


It is a basic result in linear algebra that a linear transformation is completely determined by its values at a basis. This 
principle suggests that if we have a Hill n-cipher, and if 


P1; P2,---- Bx 
are linearly independent plaintext vectors whose corresponding ciphertext vectors 


Ap, Ap3, .... AP» 


are known, then there is enough information available to determine the matrix A and hence A = (mod #2). 


The following theorem, whose proof is discussed in the exercises, provides a way to do this. 


THEOREM 10.15.5 Determining the Deciphering Matrix 


Let pj, p32, --., Py be linearly independent plaintext vectors, and let ¢1, ¢3, ..., ¢, be the corresponding ciphertext 
vectors in a Hill n-cipher. If 


is the » x » matrix with row vectors p}, Ps, oe pe and if 


is the »z x», matrix with row vectors c], ch oe ce , then the sequence of elementary row operations that reduces C 


to / transforms P to (4 4 . 


This theorem tells us that to find the transpose of the deciphering matrix 4~—!, we must find a sequence of row operations 


that reduces C to J and then perform this same sequence of operations on P. The following example illustrates a simple 
algorithm for doing this. 


EXAMPLE 8 Using Theorem 10.15.5 


The following Hill 2-cipher is intercepted: 


[OSBTGX ESPX HOPDE 
Decipher the message, given that it starts with the word DEAR. 


Solution From Table 1, the numerical equivalent of the known plaintext is 


DE AR 
45 118 

and the numerical equivalent of the corresponding ciphertext is 
I0 SB 

915 192 


so the corresponding plaintext and ciphertext vectors are 


We want to reduce 


T 
C= i 9 ‘4 
“hal a 2 
"] 
to J by elementary row operations and simultaneously apply these operations to 


T 
p.|M|_|4 5 
[pr] L118 


to obtain (A = r (the transpose of the deciphering matrix). This can be accomplished by adjoining P to the 
right of C and applying row operations to the resulting matrix [C'|] until the left side is reduced to /. The 


T 
final matrix will then have the form [/ | (A = ]. The computations can be carried out as follows: 


9 15 4 5 : 
FE , | 1 a <— We formed the matnx [C' |P ] . 
1 45 12 = 15 ts = 
E D | 1 | «— We multiplied the first row by 9° =3. 
1 19 12 15 , ; 
FE , | 1 | We replaced 45 by its residue modulo 26 . 
1 19 12 15 
E 359 | 997 a + We added — 19 tumes the first row to the second . 
1 19 12 15 ae barns 
0 5 7 19 + We replaced the entries in the second row by their residues modulo 26 . 
1 19 12. «15 = af. 
E 1 | 147 a + We multiplied the secondrow by 5° = 21. 
1 19 12 15 2s ee 
0 4 73 + We replaced the entries in the second row by their residues modulo 26 . 
aes Gee ee + We added — 19 times the second row to the first . 
0 1 7 9 
10 10 Pea Bape 
0 1 | 0 4 «+ We replaced the entries in the first row by thei residues modulo 26 . 
Thus, 
ees 1 0 
Aly = 
anal 
so the deciphering matrix is 
= 1 1? 
At= 
> | 


To decipher the message, we first group the ciphertext into pairs and find the numerical equivalent of each 
letter: 


10 SB TG AE SP XH OP DE 

915 192 207 245 1916 248 1516 45 
Next, we multiply successive ciphertext vectors on the left by ,4~! and find the alphabet equivalents of the 
resulting plaintext pairs: 


ool] = [5] 2 
fo all's] = Lis] 2 
[olla] = [a] = 
SSE] - (le 
fo ollie] = [a] 
fo ll’s) = [aa] 7 
I - (2) 4 


> IE 


Finally, we construct the message from the plaintext pairs: 


DE AR IK ES EN DT AN KS 
DEAR IKE SEND TANKS 


Further Readings 


Readers interested in learning more about mathematical cryptography are referred to the following books, the first 
of which is elementary and the second more advanced. 


1. Abraham Sinkov, Elementary Cryptanalysis, a Mathematical Approach (Mathematical Association of America, 2009). 


2. Alan G. Konheim, Cryptography, a Primer (New York: Wiley-Interscience, 1981). 


Exercise Set 10.15 


1. Obtain the Hill cipher of the message 
DARK NIGHT 


for each of the following enciphering matrices: 


(a) |1 3 
21 
(b) [4 3 
12 


(a) GIYUOKEVBH 
(b) SEANEFZWJH 


2. In each part determine whether the matrix is invertible modulo 26. If so, find its inverse modulo 26 and check your work 
by verifying that 4,4—! — 4-14 = j (mod 26). 


maf} 
(c) a=|{ i 
4/7} 
ea-[2)] 
Oa[ty 
Answer 

© ef 7 


(b) Not invertible 


(c) g-l 1 19 
ss =|,5 | 


(d) Not invertible 
(e) Not invertible 


(gaa 11S 2 
* =|; | 


3. Decode the message 
SAKNOXAOSX 
given that it is a Hill cipher with enciphering matrix 
41 
5 2] 


Answer: 


WE LOVE MATH 


4. A Hill 2-cipher is intercepted that starts with the pairs 
SEHK 
Find the deciphering and enciphering matrices, given that the plaintext is known to start with the word ARMY. 


Answer: 


oto 5 : 7 15]. Sega Ss en fs ee 
Deciphering mati =|/ 5 enciphering mati =| | 


5. Decode the following Hill 2-cipher if the last four plaintext letters are known to be ATOM. 


ENGIHGYBVREN J ¥QO 


Answer: 
THEY SPLIT THE ATOM 
6. Decode the following Hill 3-cipher if the first nine plaintext letters are IHAVECOME: 
HPAP OGGDUGDDHPGODYNOR 
Answer: 


I HAVE COME TO BURY CAESAR 


aI 


. All of the results of this section can be generalized to the case where the plaintext is a binary message; that is, it is a 
sequence of 0's and 1's. In this case we do all of our modular arithmetic using modulus 2 rather than modulus 26. Thus, 
for example, 1 +. 1 = 0 (mod 2). Suppose we want to encrypt the message 110101111. Let us first break it into triplets to 


1 1 1 110 
form the three vectors | 1 |,|0J|,] 1}, andletustake|0Q 1 1 | as our enciphering matrix. 
0 1 1 1 ile el 


(a) Find the encoded message. 


(b) Find the inverse modulo 2 of the enciphering matrix, and verify that it decodes your encoded message. 
Answer: 


(a) 010110001 


(b) [0 11 
ie Va | 
101 


8. If, in addition to the standard alphabet, a period, comma, and question mark were allowed, then 29 plaintext and 
ciphertext symbols would be available and all matrix arithmetic would be done modulo 29. Under what conditions 
would a matrix with entries in 249 be invertible modulo 29? 


Answer: 


A is invertible modulo 29 if and only if det(.4) # 0 (mod 29). 

9. Show that the modular equation 4x = 1 (mod 26) has no solution in 23 by successively substituting the values 
RH Os 2s oe 2S: 

10. (a) Let P and C be the matrices in Theorem 10.15.5. Show that P=C(A = oe 


(b) To prove Theorem 10.15.5, let #1, #3, ..., Hy, be the elementary matrices that correspond to the row operations that 
reduce C to J, so 


By..#2#,;C =i 
Show that 
a ek 
By..f9#,;P = (A) 


from which it follows that the same sequence of row operations that reduces C to J converts P to (A oly op 
i. (a) If A is the enciphering matrix of a Hill n-cipher, show that 


= Shea 
AY=(C YP) (mod 26) 
where C and P are the matrices defined in Theorem 10.15.5. 


(b) Instead of using Theorem 10.15.5 as in the text, find the deciphering matrix ,4—! of Example 8 by using the result in 
part (a) and Equation 2 to compute ¢7~!. [Note: Although this method is practical for Hill 2-ciphers, Theorem 
10.15.5 is more efficient for Hill n-ciphers with » >. 2.] 


Section 10.15 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, 
Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with 
some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular 
utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. 
Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of 
the problems in the regular exercise sets. 


T1. Two integers that have no common factors (except 1) are said to be relatively prime. Given a positive integer n, let 
Sy = {@1, 22, 23, -.., 2m}, where 4] <@2 < a3 <...< ayy, be the set of all positive integers less than n and relatively 
prime to n. For example, if x = 9, then 


So= {@1, 42, @3,...a@6} = {1, 2,4, 5,7, 8} 


(a) Construct a table consisting of n and Sj, for x = 2, 3, ..., 15, and then compute 
m m 
Sia, and (= “| (mod ») 
k=1 =1 


in each case. Draw a conjecture for », = 14 and prove your conjecture to be true. [Hint: Use the fact that if a is 
relatively prime to n, then » — g is also relatively prime to n.] 


(b) Given a positive integer n and the set Sj, let P,, be the jz x 9, matrix 
2, @2 @3 .-. &m-1 @m 
a2 @3 a4... am a 
pa| 3-2 a5 a, a3 
am—1 2m 1 4m—3 @m—2 
am @, a2 4m—2 @m-1 
so that, for example, 
124578 
24578 1 
Po= 457812 
ane oe ee 
781245 
812457 
Use a computer to compute det(P,,) and det(P,,) (mod ») for x = 2, 3, ..., 15, and then use these results to construct a 


conjecture. 


(c) Use the results of part (a) to prove your conjecture to be true. [Hint: Add the first ; — 1 rows of P,, to its last row and 
then use Theorem 2.2.3.] What do these results imply about the inverse of P,,(mod #)}? 


T2. Given a positive integer n greater than 1, the number of positive integers less than n and relatively prime to n is called 

the Euler phi function of n and is denoted by a{). For example, (6) = 2 since only two positive integers (1 and 5) are 

less than 6 and have no common factor with 6. 

(a) Using a computer, for each value of » = 2, 3, .... 25 compute and print out all positive integers that are less than n and 
relatively prime to n. Then use these integers to determine the values of o(#} for x = 2, 3, ..., 25. Can you discover a 
pattern in the results? 


(b) It can be shown that if {p1, P3, P3, -... Pm} are all the distinct prime factors of n, then 


re) -a( =F) 


For example, since {2, 3} are the distinct prime factors of 12, we have 
312) = es eee a 
(12) = 12(1 alt 3) 4 


which agrees with the fact that {1, 5, 7, 11} are the only positive integers less than 12 and relatively prime to 12. 
Using a computer, print out all the prime factors of n for x = 2, 3, ..., 25. Then compute (7) using the formula above 
and compare it to your results in part (a). 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.16 Genetics 


In this section we investigate the propagation of an inherited trait in successive generations by computing 
powers of a matrix. 


Prerequisites 


Eigenvalues and Eigenvectors 
Diagonalization of a Matrix 


Intuitive Understanding of Limits 


Inheritance Traits 


In this section we examine the inheritance of traits in animals or plants. The inherited trait under consideration 
is assumed to be governed by a set of two genes, which we designate by A and a. Under autosomal 
inheritance each individual in the population of either gender possesses two of these genes, the possible 
pairings being designated AA, Aa, and aa. This pair of genes is called the individual's genotype, and it 
determines how the trait controlled by the genes is manifested in the individual. For example, in snapdragons 
a set of two genes determines the color of the flower. Genotype AA produces red flowers, genotype Aa 
produces pink flowers, and genotype aa produces white flowers. In humans, eye coloration is controlled 
through autosomal inheritance. Genotypes 4A and aa have brown eyes, and genotype Aa has blue eyes. In this 
case we say that gene A dominates gene a, or that gene a is recessive to gene A, because genotype Aa has the 
same outward trait as genotype AA. 


In addition to autosomal inheritance we will also discuss X-linked inheritance. In this type of inheritance, the 
male of the species possesses only one of the two possible genes (A or a), and the female possesses a pair of 
the two genes (AA, aa, or Aa). In humans, color blindness, hereditary baldness, hemophilia, and muscular 
dystrophy, to name a few, are traits controlled by X-linked inheritance. 


Below we explain the manner in which the genes of the parents are passed on to their offspring for the two 
types of inheritance. We construct matrix models that give the probable genotypes of the offspring in terms of 
the genotypes of the parents, and we use these matrix models to follow the genotype distribution of a 
population through successive generations. 


Autosomal Inheritance 


In autosomal inheritance an individual inherits one gene from each of its parents' pairs of genes to form its 
own particular pair. As far as we know, it is a matter of chance which of the two genes a parent passes on to 
the offspring. Thus, if one parent is of genotype Aa, it is equally likely that the offspring will inherit the 4 


gene or the a gene from that parent. If one parent is of genotype aa and the other parent is of genotype Aa, the 
offspring will always receive an a gene from the aa parent and will receive either an A gene or an a gene, with 
equal probability, from the Aa parent. Consequently, each of the offspring has equal probability of being 
genotype aa or Aa. In Table | we list the probabilities of the possible genotypes of the offspring for all 
possible combinations of the genotypes of the parents. 


Table 1 


Genotypes of Parents 


eee 


EXAMPLE 1 Distribution of Genotypes ina Population 


Suppose that a farmer has a large population of plants consisting of some distribution of all 
three possible genotypes AA, Aa, and aa. The farmer desires to undertake a breeding program in 
which each plant in the population is always fertilized with a plant of genotype AA and is then 
replaced by one of its offspring. We want to derive an expression for the distribution of the 
three possible genotypes in the population after any number of generations. 


For x» = 0, 1, 2,..., let us set 


@, = fraction of plants of genotype AA tn » th generation 


oO 
= 
I 


fraction of plants of genotype Aq in » th generation 

€, = fraction of plants of genotype aa in » th generation 
Thus @g, 4g, and ¢g specify the initial distribution of the genotypes. We also have that 
ty + by +cy,= 1 forx=0, 1, 2,... 


From Table 1 we can determine the genotype distribution of each generation from the genotype 
distribution of the preceding generation by the following equations: 


ay = a@y-14 stn 
b, = Cy14 son 1. (1) 
j =. 0 


For example, the first of these three equations states that all the offspring of a plant of genotype 
AA will be of genotype AA under this breeding program and that half of the offspring of a plant 
of genotype Aa will be of genotype AA. 


Equations | can be written in matrix notation as 


x) = Mx?) y= 1,2,... (2) 
where 
i 
ay ay—-1 | 2 7 
a) by Rl Bat |, and Bl 1 
Cy fy—-1 2 
00 0 


Note that the three columns of the matrix / are the same as the first three columns of Table 1. 
From Equation 2 it follows that 
x? = Mx?) = Mex@-4) =e M%xD (3) 


Consequently, if we can find an explicit expression for jf”, we can use 3 to obtain an explicit 
expression for x, To find an explicit expression for Af”, we first diagonalize M. That is, we 
find an invertible matrix P and a diagonal matrix D such that 


M=Ppp (4) 


With such a diagonalization, we then have (see Exercise 1) 


M" = PD"P— for » = 1, 2. ... 


where 
M00... o]” | 0 0... 0 
pra} o 42 0... O} _]O AYO.. 0 
0 0 0 2. AR 000... % 


The diagonalization of Mis accomplished by finding its eigenvalues and corresponding 
eigenvectors. These are as follows (verify): 


Eigenvalues: Pe, ne er + i= 0 
1 1 | 
Corresponding eigenvectors: vy=|0|], vwa2=|/—1], ¥w3=]—2 
0 0 1 
Thus, in Equation 4 we have 
M0 0 10 0 
D=|0 r» 0 |=/o 5 0 


and 


1 1 1 
P= [v,lvqlv3] =|0 —1 =—2 
0 1 
Therefore, 
1 oO OQ 
- t+ i Ld 1 || 40 
x= pPp"P1yO—|9 1 ~2|/0 (2) o|}o -1 -2]| a5 
en 0 OO 1] eo 
0 oO OQ 
or 
1 ” 1 n—l 
ay ' 1-(3] 1=(5) ay 
x?) = by —— 0 a n 1 n—-1 bg 
cy 2 2 eo 
0 0 0 
1\" 1 n—l 
ag+og+ceo— Co) bo (2) co 
= ” n—l 
Ayo 
0 
Using the fact that ag ++ bg ++ cg = 1, we thus have 
1\" 1 n—l 
on = 1-(p)b0- (3) a 
n n—l x=1,2,... (5) 
m= oe Gs 
cy = 5 


These are explicit formulas for the fractions of the three genotypes in the nth generation of 
plants in terms of the initial genotype fractions. 


n 
Because 2) tends to zero as n approaches infinity, it follows from these equations that 
ay _ 1 
b, — 0 
cy. = -0 


as n approaches infinity. That is, in the limit all plants in the population will be genotype AA. 


EXAMPLE 2 Modifying Example1 


We can modify Example | so that instead of each plant being fertilized with one of genotype 
AA, each plant is fertilized with a plant of its own genotype. Using the same notation as in 
Example 1, we then find 


xo) = Mx ® 
where 
L. 
1 4 0 
M=|o 1 0 
2 
1 
0 4 1 


The columns of this new matrix M are the same as the columns of Table | corresponding to 
parents with genotypes AA—AA, Aa—Aa, and aa—aa. 


The eigenvalues of M are (verify) 


: Cs De eb = 


The eigenvalue Aj = 1 has multiplicity two and its corresponding eigenspace is 
two-dimensional. Picking two linearly independent eigenvectors ¥1 and ¥3 in that eigenspace, 


and a single eigenvector V3 for the simple eigenvalue Az = > we have (verify) 


1 0 1 
¥y=|0}], vg2=|0], v3=] —2 
0 1 1 


The calculations for x are then 


x? — MW" — ppp y,O 


1 10 
PO flea. ; ao 
= |/0 0 =2 n{[S 5 11/40 
01 i}foo (5) eG 
2} lilo —1 0 
2 
1 pyr! 
ey 0 
1" . 
= |o (2) 0|| dp 


Thus, 


I 
R 
Oo 
| 
| 
in 
ho) 
a 
= 
— 
_— 
a | 
oO 
Oo 


ay 
1 nv 
— (2) bi ¢=19 (6) 
n+l 
= 0+ |3-(2)"" fo 
4] 
In the limit, as n tends to infinity, (3 — 0 and (1) _, 0), So 
1 
ay, — aot 520 
b, — O 
1, 
Cy — cot 5 Q 


Thus, fertilization of each plant with one of its own genotype produces a population that in the 
limit contains only genotypes AA and aa. 


Autosomal Recessive Diseases 


There are many genetic diseases governed by autosomal inheritance in which a normal gene A dominates an 
abnormal gene a. Genotype AA is a normal individual; genotype Aa is a carrier of the disease but is not 
afflicted with the disease; and genotype aa is afflicted with the disease. In humans such genetic diseases are 
often associated with a particular racial group—for instance, cystic fibrosis (predominant among Caucasians), 
sickle-cell anemia (predominant among people of African origin), Cooley's anemia (predominant among 
people of Mediterranean origin), and Tay-Sachs disease (predominant among Eastern European Jews). 


Suppose that an animal breeder has a population of animals that carries an autosomal recessive disease. 
Suppose further that those animals afflicted with the disease do not survive to maturity. One possible way to 
control such a disease is for the breeder to always mate a female, regardless of her genotype, with a normal 
male. In this way, all future offspring will either have a normal father and a normal mother (AA4—AA matings) 
or a normal father and a carrier mother (A4—Aa matings). There can be no AA—aa matings since animals of 
genotype aa do not survive to maturity. Under this type of mating program no future offspring will be 
afflicted with the disease, although there will still be carriers in future generations. Let us now determine the 
fraction of carriers in future generations. We set 


a 
x) — ra u=1,2,... 


Hv 
where 
@y, = fraction of population of genotype AA in » th generation 
b, = fraction of population of genotype Aa (carriers) m » th generation 


Because each offspring has at least one normal parent, we may consider the controlled mating program as one 


of continual mating with genotype Aa, as in Example 1. Thus, the transition of genotype distributions from 
one generation to the next is governed by the equation 


x) — Mx?-)_ me 1523 


where 


v= 


mle pole 


0 


Because we know the initial distribution ,), the distribution of genotypes in the nth generation is thus given 
by 
x) — M"xO, u=1,2,... 


The diagonalization of M is easily carried out (see Exercise 4) and leads to 


1 0 
mm _ np-1.0)_|1 1 nff{l 14/40 
x POP =x i E| 0 (3) | ay |e 


Because ag + 4g = 1, we have 


; No 1, Byte (7) 
b,= [=] 2 
Thus, as 1 tends to infinity, we have 
a, —1 
by, 0 
so in the limit there will be no carriers in the population. 
From 7 we see that 
acl o 
by = ~by-], x=1, ore (8) 


2 


That is, the fraction of carriers in each generation is one-half the fraction of carriers in the preceding 
generation. It would be of interest also to investigate the propagation of carriers under random mating, when 
two animals mate without regard to their genotypes. Unfortunately, such random mating leads to nonlinear 
equations, and the techniques of this section are not applicable. However, by other techniques it can be shown 
that under random mating, Equation 8 is replaced by 


by = — onal n= 1,2... a 


As a numerical example, suppose that the breeder starts with a population in which 10% of the animals are 
carriers. Under the controlled-mating program governed by Equation 8, the percentage of carriers can be 
reduced to 5% in one generation. But under random mating, Equation 9 predicts that 9.5% of the population 
will be carriers after one generation (6, = .095 if b,,_; = .10). In addition, under controlled mating no 
offspring will ever be afflicted with the disease, but with random mating it can be shown that about | in 400 
offspring will be born with the disease when 10% of the population are carriers. 


X-Linked Inheritance 


As mentioned in the introduction, in X-linked inheritance the male possesses one gene (A or a) and the female 
possesses two genes (AA, Aa, or aa). The term X-linked is used because such genes are found on the 
X-chromosome, of which the male has one and the female has two. The inheritance of such genes is as 
follows: A male offspring receives one of his mother's two genes with equal probability, and a female 
offspring receives the one gene of her father and one of her mother's two genes with equal probability. 
Readers familiar with basic probability can verify that this type of inheritance leads to the genotype 
probabilities in Table 2. 


Table 2 


Genotypes of Parents (Father, Mother) 


Sf 
& 

= 

> 
— 

° 


We will discuss a program of inbreeding in connection with X-linked inheritance. We begin with a male and 
female; select two of their offspring at random, one of each gender, and mate them; select two of the resulting 
offspring and mate them; and so forth. Such inbreeding is commonly performed with animals. (Among 
humans, such brother-sister marriages were used by the rulers of ancient Egypt to keep the royal line pure.) 


The original male-female pair can be one of the six types, corresponding to the six columns of Table 2: 
(A, AA), (A, Aa), (A, aa), (a, AA), (a, Aa), (a, aa) 

The sibling pairs mated in each successive generation have certain probabilities of being one of these six 

types. To compute these probabilities, for x = 0, 1, 2,..., let us set 


@y, = probability siblng-par mated in » th generation is type (A, AA) 


by, = probability siblng-par mated in » th generation is type (A, Aa) 
Cy = probability siblng-pair mated in » th generation is type CA, aa) 
dy = probability sibling-pair mated in » th generation is type (a, AA) 
@, = probability siblng-par mated in » th generation is type (a, Aa) 
ji» = probability siblng-pair mated in » th generation is type (a, aa) 
With these probabilities we form a column vector 
ty 
by 
x) — - m=O, 1,2: 
ey 
Jn 
From Table 2 it follows that 
x) = Mx®) n=1,2,... (10) 


where 


(A, AA) (A, Aa) (A,aa) (a, AA) (a, Aa) (a, aa) 


1 


0 


0 


0 


0 


0 


Oo fl Bl 


So Bf) Blo 


0 0 


0 1 


0 0 


0 0 


1 0 


0 0 


0 


So Ble Blo 


fs) Blo 


° | 4, AA) 
0} (A, Aa) 
0 | (A, aa) 
0 | (a, AA) 
9 | (a, 4a) 


(a, aa) 


For example, suppose that in the (7 — 1)-st generation, the sibling pair mated is type (.4, Az). Then their 
male offspring will be genotype A or a with equal probability, and their female offspring will be genotype AA 
or Aa with equal probability. Because one of the male offspring and one of the female offspring are chosen at 
random for mating, the next sibling pair will be one of type (A, AA), (A, Aa), (a, AA), or (a, Aa) with 


equal probability. Thus, the second column of M contains 


four sibling pairs. (See Exercise 9 for the remaining columns.) 


oe ” 


As in our previous examples, it follows from 10 that 


in each of the four rows corresponding to these 


x) = "xO »=1,2,... (11) 


After lengthy calculations, the eigenvalues and eigenvectors of M turn out to be 


Met, Sek a3=4, M= 5, As= 41+ ¥'5), Mo= GZ -¥'5) 


1 0 =-1 1 
0 0 2 —6 
vi alo l weet ol wee! lv, =| 72 
BLO ae] Lee cae 
0 0 —2 6 
0 1 1 —1 
4(-3-¥5) z(-3 445) 
1 1 
a(-14 95) q(-1-¥5) 
v5= 1 .Vg= 1 
g(a i+ y¥5) gini=y5) 
1 1 
4(-3-45) (-34 95) 
The diagonalization of M then leads to 
x = Pp*PALD y=1,2,... (12) 


where 


10-1 1 F¢-3-y5) 3(-3 495) 
Om Bee 1 1 
00 -1 -3 Fc-14 95) F(-1-75) 


P= 

00 1 3 F-14975) f(-1-95) 

a 1 1 

01 1-1 3(-3-95) Fc-34 95) 

10 0 0 0 0 

1 “O 0 0 0 
1\" 

00 (2) 0 0 0 

4] 
p? = joo o ay 0 0 
vn 
00 0 0 [ac + ¥5)| 0 
yn 

00 0 0 0 [3 -¥5)] 
z= 1 2 1 

} 3 3 3 3 , 
1 2 a Ps 

” 3 3 3 3 , 
<li Bee | al: a. 

po = 0 8 4 4 8 

a dint ci, gis ees 

24 12 «12 24 


0 apt (5) o¥5 
1 1 1 1 
0 ag S-¥5) —gv5 —g¥5 gp6—¥5) 0 


We will not write out the matrix product in 12, as it is rather unwieldy. However, if a specific vector x) is 
given, the calculation for x) is not too cumbersome (see Exercise 6). 


Lalo 
th] 

th 

&|- 
Vv” 
Ly 
_ 
th] 
eur” 
fas) 


Because the absolute values of the last four diagonal entries of D are less than 1, we see that as n tends to 
infinity, 


dD" 


oo ococlcUcChlhcOlhr 
oo oo Fe Oo 
oo Oo o co & 
oo ooo lwo 
oo ocClcoOo lho 
oo ococlUcCmmhUcCOOlhUcO 


And so, from Equation 12, 


oo Co co kK & 
oo oc co & 
oo oc co Oo 
oo oO OC & 


0 
Performing the matrix multiplication on the right, we obtain (verify) 


Be 1s. ye 1 


ag + 300+ 300 + 340 + 220 
0 
~_, 0 
x ; (13) 
0 
Lie te ip te Len ce 
tot 30 t 300 | 340 t 320 


That is, in the limit all sibling pairs will be either type (A, AA) or type (a, aa). For example, if the initial 
parents are type (.A, Aa) (that is, bp = 1 and ag =cg = dg = eg = f g = 9), then as x tends to infinity, 


7 
Wh OOD FS WIM 


Thus, in the limit there is probability 2 that the sibling pairs will be (A, AA), and probability 3 that they will 


be (a, aa). 


Exercise Set 10.16 


1. Show that if yy — ppp, then yg" — Pp’ Pp for = 1, 2,.... 


2. In Example | suppose that the plants are always fertilized with a plant of genotype Aa rather than one of 
genotype AA. Derive formulas for the fractions of the plants of genotypes AA, Aa, and aa in the nth 
generation. Also, find the limiting genotype distribution as n tends to infinity. 


Answer: 


n+l 
on=4 4 (2) (ag —cp) ay +5 
b ae B= 15 25258 ad as #00 
” > > &;- nH > ‘ 
n+l vd 
en=4-(5] (ao —<¢0) a a 


. In Example | suppose that the initial plants are fertilized with genotype AA, the first generation is 
fertilized with genotype Aa, the second generation is fertilized with genotype AA, and this alternating 
pattern of fertilization is kept up. Find formulas for the fractions of the plants of genotypes AA, Aa, and aa 
in the nth generation. 


Answer: 

a2n+1 = g FONG (2a9 — 4g — 4cq) 

ban = 3 — Grape (20 — bo — Aco) n=0, 1, 2. 35: 
Cln+1 = 

=o 1 _ 
an = 19 + egy (20 — 20 — Aco) 
bay, = > n=1,2, 
1 


. In the section on autosomal recessive diseases, find the eigenvalues and eigenvectors of the matrix M and 
verify Equation 7. 


Answer: 


Eigenvalues: Ay = 1, Az = 3 eigenvectors: e; = a e7 = | 

. Suppose that a breeder has an animal population in which 25% of the population are carriers of an 
autosomal recessive disease. If the breeder allows the animals to mate irrespective of their genotype, use 
Equation 9 to calculate the number of generations required for the percentage of carriers to fall from 25% 
to 10%. If the breeder instead implements the controlled-mating program determined by Equation 8, what 
will the percentage of carriers be after the same number of generations? 


Answer: 


12 generations; .006% 


. In the section on X-linked inheritance, suppose that the initial parents are equally likely to be of any of the 
six possible genotype parents; that is, 


Ale Ale Ale Ale Ale Alo 


Using Equation 12, calculate x) and also calculate the limit of x“ as n tends to infinity. 


Answer: 
1,1 1 n+l n+l 
Sta Gar (3 - V+ 95) + (-3 4 95) 95) I 
: a 1+ ‘Bde = Gas)" 
| ” ” 

»_|3 A [d+y75) +0-95) ] 

x = ‘ 
1 1 n ” ° 
3 pr + (5) +(-95) ] 
3 a [(1 + ne n a- fa" 
| See ere | n+l n+l 
at 3 pare (3 v5 + 95) + (- 34+ 5) 5) 
1 
2 
0 

x) ad : as #2 —- OO 
0 
1 
2 


7. From 13 show that under X-linked inheritance with inbreeding, the probability that the limiting sibling 
pairs will be of type (A, AA) is the same as the proportion of A genes in the initial population. 


8. In X-linked inheritance suppose that none of the females of genotype Aa survive to maturity. Under 
inbreeding the possible sibling pairs are then 


(A, AA), (A, aa), (a, AA), and (a, aa) 


Find the transition matrix that describes how the genotype distribution changes in one generation. 


Answer: 


oo oOo Ke 
oo oO 2 
oo oO 2 
—- OO & 


9. Derive the matrix M in Equation 10 from Table 2. 
Section 10.16 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be 
MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra 
software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to 
read the relevant documentation for the particular utility you are using. The goal of these exercises is to 
provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in 
these exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 


T1. 


(a) Use a computer to verify that the eigenvalues and eigenvectors of 


A 
1$ 0000 
£ 1 
050140 
000050 

M=| | 
070000 
- 1 
oF 1040 
1 
000071 


as given in the text are correct. 
(b) Starting with ,@ — agy@—D and the assumption that 
lim x =x 
nu— po 
exists, we must have 
lim x°? = M im x?) or x = Mx 
n—-po “Do 


This suggests that x can be solved directly using the equation (Af — /)x = 0. Use a computer to solve the 
equation x = j¢x, where 


SO RA & 


anda + b-+-c+d+e-+ jf = 1; compare your results to Equation 13. Explain why the solution to 
(M —/)x = 0 along witha + 4 +c¢ +d +e + jf = 1 is not specific enough to determine am x), 
— bo 


12s 
(a) Given 
10 =1 1 
00 2 =6 
00 =1 =3 
P= 
00 1 3 
00-2 6 
0 1 1 =1 
from Equation 12 and 
im D”? = 
np 
use a computer to show that 
im 4” = 
n—-po 


(b) Use a computer to calculate Af” for » — 10, 20, 30, 40, 50, 


limit in part (a). 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


g(-3-95) (-34 95) 
1 1 
go-14 95) 4(-1-95) 
q(-14 95) Fc-1-y5) 
1 1 
g(-3-95) (-34 95) 
Po 00-0. 0 
010000 
00000 0 
00000 0 
0000 0 0 
O-0 0-2 oO 
2412 a 
: aS. 243 
00 0 0 0 0 
00-0 0 0°) 
OOS ol ote oO 
0 O07 oO -O 0 
ke a tae 
‘ ca ek ee : 


60, 70, and then compare your results to the 


10.17 Age-Specific Population Growth 


In this section we investigate, using the Leslie matrix model, the growth over time of a female population that 
is divided into age classes. We then determine the limiting age distribution and growth rate of the population. 


Prerequisites 


Eigenvalues and Eigenvectors 
Diagonalization of a Matrix 


Intuitive Understanding of Limits 


One of the most common models of population growth used by demographers is the so-called Leslie model 
developed in the 1940s. This model describes the growth of the female portion of a human or animal 
population. In this model the females are divided into age classes of equal duration. To be specific, suppose 
that the maximum age attained by any female in the population is LZ years (or some other time unit) and we 
divide the population into n age classes. Then each class is F, / » years in duration. We label the age classes 
according to Table 1. 


Table 1 


Age Class Age Interval 


[O, L/n) 
[L/n, 2L/ n) 
[2L/ n, 3L/ n) 


[(n —2)L/ n(n — 1L)L/ n) 
[(n—LL/ nn, L] 


Suppose that we know the number of females in each of the 7 classes at time ¢ = 0. In particular, let there be 
x0 females in the first class, xs females in the second class, and so forth. With these n numbers we form a 


column vector: 


We call this vector the initial age distribution vector. 


As time progresses, the number of females within each of the n classes changes because of three biological 
processes: birth, death, and aging. By describing these three processes quantitatively, we will see how to 
project the initial age distribution vector into the future. 


The easiest way to study the aging process is to observe the population at discrete times—-say, 


to, £1, £2, --., £4, -... The Leslie model requires that the duration between any two successive observation 
times be the same as the duration of the age intervals. Therefore, we set 

fy = 2 

fh = fie 

fg = 2ihin 

fy = &Lin 


With this assumption, all females in the (7 +- 1)-st class at time ¢;, 4 were in the ith class at time f,. 


The birth and death processes between two successive observation times can be described by means of the 
following demographic parameters: 


The average number of daughters 
born to each female during the 
time she is In the ith age class 


The fraction of females in the ith 
age class that can be expected to 
survive and pass into the (/ +1)-st 


age class 


By their definitions, we have that 
(va; > 0 boris 1 2c 
(WO_<S;,<1 for? =—1,2,..,2—1 
Note that we do not allow any 4, to equal zero, because then no females would survive beyond the ith age 


class. We also assume that at least one @; is positive so that some births occur. Any age class for which the 
corresponding value of @; is positive is called a fertile age class. 


We next define the age distribution vector x“) at time ¢ i, by 


where x0 is the number of females in the ith age class at time ¢;. Now, at time £;,, the females in the first age 


class are just those daughters born between times ¢;,_; and ¢;. Thus, we can write 


number of number of number of 


daughters daughters daughters 
number of 
born to born to born to 
females ; ; ; 
: females in + females in See females in 
in class1 eae fap ' 
aeame te class . class class n 
between times between times between times 
fy, -1 and fj, fy,-1 and fj, fy,-1 and fj, 
or, mathematically, 
Kk k—-1 k-1 k-1 
x0 =ayxt + agx$ a a ) (1) 


The females in the (7 ++ 1)-st age class (= 1, 2, ....% — 1) at time ¢; are those females in the ith class at 
time ¢;,_; who are still alive at time ¢;,. Thus, 


fraction of 

number of females in number of 

females in IS classi females in 

classi-+-1/  ‘ who survive classi 

at time £%, and pass into | | at time £;,_4 
classi?+ 1 

or, mathematically, 
fi k-1 ; 
ae = bjx! > i=1,2,..,.2—1 (2) 


Using matrix notation, we can write Equations | and 2 as 


@ ed 
1 @1 @2 QZ... Qy-1 ay ||! 
| Ja, 0 0 0 o|e? 
x0 =|0 42 0 : 0 0 
; OD: OP-.0% sa Bey. 10 
(e) ne ed 
n vn 


or more compactly as 


x) — fx@-D) p12... (3) 
where L is the Leslie matrix 
a, @2 &3 ay—-]| ay 
ha> DB’. 9 0 0 


L=|0 b 0... 0 0 (4) 


From Equation 3 it follows that 


Thus, if we know the initial age distribution x and the Leslie matrix L, we can determine the female age 


distribution at any later time. 


Ix 
Ix) =i 2,0 
Ix = £35 


Ex@—-) = p® 


EXAMPLE 1 Female Age Distribution for Animals 


Suppose that the oldest age attained by the females in a certain animal population is 15 years 
and we divide the population into three age classes with equal durations of five years. Let the 


Leslie matrix for this population be 


If there are initially 1000 females in each of the three age classes, then from Equation 3 we 


have 
1, 000 
x9 = | 1,000 
1, 000 
x) = 7x9 
x) = gx 
x9 = fx — 


So NR So oOo MR oS 


So NR Oo 


o f& 
a) 


oOo NMR oS 


4 3 
1,000) 7, 000 
0 Sl 1, 000]/=| 509 
1 || 1, 000 250 
4 
4 
7, 000 2, 750 
0 0 
500 | =| 3, 500 
i 250 125 
4 0 
4 3 
a6 2, 750 14, 375 
3,500}=| 1,375 
dt 125 875 
3 0 


Thus, after 15 years there are 14,375 females between 0 and 5 years of age, 1375 females 
between 5 and 10 years of age, and 875 females between 10 and 15 years of age. 


(5) 


Limiting Behavior 


Although Equation 5 gives the age distribution of the population at any time, it does not immediately give a 
general picture of the dynamics of the growth process. For this we need to investigate the eigenvalues and 
eigenvectors of the Leslie matrix. The eigenvalues of L are the roots of its characteristic polynomial. As we 
ask you to verify in Exercise 2, this characteristic polynomial is 


pd) = W-Z| 


= \"- aya"! _ agbr”-? - azbjbox"? =... = Ayb1b2.. by} 


To analyze the roots of this polynomial, it will be convenient to introduce the function 


b byb byb9.. by 
100) = SE SE SL 6) 


Using this function, the characteristic equation p»{.\) = 0 can be written (verify) 
g(A}=1 ford #0 (7) 


Because all the @; and 4, are nonnegative, we see that g{) is monotonically decreasing for \ greater than 
zero. Furthermore, g{A) has a vertical asymptote at \, — 0) and approaches zero as \ —+ o9. Consequently, as 
Figure 10.17.1 indicates, there is a unique \, say A= Aj, such that g{A;) = 1. That is, the matrix L has a 
unique positive eigenvalue. It can also be shown (see Exercise 3) that Ay has multiplicity 1; that is, Ay is not a 
repeated root of the characteristic equation. Although we omit the computational details, you can verify that 
an eigenvector corresponding to Aj is 


1 
by fay 
bibg/d? 


xj = 
bibab3/ AP 


(8) 


bybz. by fat! 


Because A; has multiplicity 1, its corresponding eigenspace has dimension | (Exercise 3), and so any 
eigenvector corresponding to it is some multiple of X;. We can summarize these results in the following 
theorem. 


Qa) 


Figure 10.17.1 


THEOREM 10.17.1 Existence of a Positive Eigenvalue 


A Leslie matrix L has a unique positive eigenvalue A. This eigenvalue has multiplicity 7 and an 
eigenvector Xj all of whose entries are positive. 


We will now show that the long-term behavior of the age distribution of the population is determined by the 
positive eigenvalue A, and its eigenvector X1. In Exercise 9 we ask you to prove the following result. 


THEOREM 10.17.2 Eigenvalues of a Leslie Matrix 


If Ay is the unique positive eigenvalue of a Leslie matrix 4, and Aj is any other real or complex 
eigenvalue of £, then |Ax| = Aj. 


For our purposes the conclusion in Theorem 10.17.2 is not strong enough; we need Aj to satisfy |Aj| < Ay. In 
this case Ay would be called the dominant eigenvalue of L. However, as the following example shows, not all 
Leslie matrices satisfy this condition. 


EXAMPLE 2 Leslie Matrix with No Dominant Eigenvalue 


Let 
0 0 6 
1 
r-|> 2 9 
1 
0 3 9 


Then the characteristic polynomial of L is 
p(d) = b=z| =~1 


The eigenvalues of L are thus the solutions of ,? — 1—namely, 
xa, -14.23, _1_¥3, 
: 2 2 2 2 


All three eigenvalues have absolute value 1, so the unique positive eigenvalue A; = 1 is not 
dominant. Note that this matrix has the property that 73 — j. This means that for any choice of the 


initial age distribution xO), we have 
yO—~O_,O— —,OO. 


The age distribution vector thus oscillates with a period of three time units. Such oscillations (or 


population waves, as they are called) could not occur if Ay were dominant, as we will see below. 


It is beyond the scope of this book to discuss necessary and sufficient conditions for Aj to be a dominant 
eigenvalue. However, we will state the following sufficient condition without proof. 


THEOREM 10.17.3 Dominant Eigenvalue 


If two successive entries @; and @;+1 in the first row of a Leslie matrix L are nonzero, then the 
positive eigenvalue of Z is dominant. 


Thus, if the female population has two successive fertile age classes, then its Leslie matrix has a dominant 
eigenvalue. This is always the case for realistic populations if the duration of the age classes is sufficiently 
small. Note that in Example 2 there is only one fertile age class (the third), so the condition of Theorem 
10.17.3 is not satisfied. In what follows, we always assume that the condition of Theorem 10.17.3 is satisfied. 


Let us assume that L is diagonalizable. This is not really necessary for the conclusions we will draw, but it 
does simplify the arguments. In this case, L has n eigenvalues, Aj, Az, -.., Ay, not necessarily distinct, and n 
linearly independent eigenvectors, xj, x3, ..., X,, corresponding to them. In this listing we place the dominant 
eigenvalue Aj first. We construct a matrix P whose columns are the eigenvectors of L: 

P= [x1 [xg|x3|--[xy] 


The diagonalization of L is then given by the equation 


MW 0 0.. 0 
perf? 20 ~ Opn 
0 00... Ay 


From this it follows that 


0 00... x 
fork = 1, 2,.... For any initial age distribution vector x“, we then have 
Moo... 0 
pO plo MO... 0 |ptyo 
0 00..M 


for k = 1, 2, .... Dividing both sides of this equation by a and using the fact that x“ — 7 *,©), we have 


ae (9) 


\ k 
0 0 0... 57] 
AY 
Because Ay is the dominant eigenvalue, we have |A; / Aj] < 1 for? = 2, 3, ..., #. It follows that 
(A; ay —Oask—oo fori=2,3,..,2 
Using this fact, we can take the limit of both sides of 9 to obtain 


100... 0 
fim {+ x®) =p)? 9 0 -. OlprAL® (10) 
k—- pe Ay » 8 8 : 

000... 0 


Let us denote the first entry of the column vector P~!, by the constant c. As we ask you to show in 


Exercise 4, the right side of 10 can be written as ¢X1, where c is a positive constant that depends only on the 
initial age distribution vector x). Thus, 10 becomes 


ia} ®| ~ex; (11) 
Equation 11 gives us the approximation 
x0) mw eM xy (12) 
for large values of k. From 12 we also have 
x) won yy (13) 
Comparing Equations 12 and 13, we see that 
x0) me yx“) (14) 
for large values of k. This means that for large values of time, each age distribution vector is a scalar multiple 
of the preceding age distribution vector, the scalar being the positive eigenvalue of the Leslie matrix. 


Consequently, the proportion of females in each of the age classes becomes constant. As we will see in the 
following example, these limiting proportions can be determined from the eigenvector X1. 


EXAMPLE 3 Example 1 Revisited << 


The Leslie matrix in Example | was 


0 4 3 
rele 0 0 
0 50 
Its characteristic polynomial is p(A) = a) 2, and you can verify that the positive 


eigenvalue is \y = 2. From 8 the corresponding eigenvector Xj is 
g 1=>5 p g elg 


oe 
Co 
bo 
— 
Pd 
Pe 
| 
ie es 
ts] 


From 14 we have 
On 3, 


for large values of k. Hence, every five years the number of females in each of the three classes 
will increase by about 50%, as will the total number of females in the population. 


From 12 we have 


Consequently, eventually the females will be distributed among the three age classes in the ratios 
ligt. This corresponds to a distribution of 72% of the females in the first age class, 24% of the 
females in the second age class, and 4% of the females in the third age class. 


EXAMPLE 4 Female Age Distribution for Humans 


In this example we use birth and death parameters from the year 1965 for Canadian females. 
Because few women over 50 years of age bear children, we restrict ourselves to the portion of the 
female population between 0 and 50 years of age. The data are for 5-year age classes, so there are a 
total of 10 age classes. Rather than writing out the 10 x 10 Leslie matrix in full, we list the birth 
and death parameters as follows: 


(0, 5) 0.00000 | 0.99651 
[5, 10) 0.00024 0.99820 
(10,15) | 0.05861 | 0.99802 
[15,20) | 0.28608 | 0.99729 


(20,25) | 0.44791 | 0.99694 
(25.30) | 0.36399 | 0.99621 
(30,35) | 0.22259 | 0.99460 
(35,40) | 0.10457 | 0.99184 
(40,45) | 0.02826 | 0.98700 
[45, 50) 0,00240 = 


Using numerical techniques, we can approximate the positive eigenvalue and corresponding 
eigenvector by 


1.00000 
0.92594 
0.85881 
0.79641 
0.73800 
0.68364 
0.63281 
0.58482 
0.53897 
0.49429 


My = 1.07622 and x= 


Thus, if Canadian women continued to reproduce and die as they did in 1965, eventually every 5 
years their numbers would increase by 7.622%. From the eigenvector Xj, we see that, in the limit, 
for every 100,000 females between 0 and 5 years of age, there will be 92,594 females between 5 
and 10 years of age, 85,881 females between 10 and 15 years of age, and so forth. 


Let us look again at Equation 12, which gives the age distribution vector of the population for large times: 
x ~ oh xy (15) 


Three cases arise according to the value of the positive eigenvalue Aj: 
4) The population is eventually increasing if Ay > 1. 
(a) The population is eventually decreasing if Ay <1. 
(ut) The population eventually stabilizes of Ay = 1. 
The case Aj = 1 is particularly interesting because it determines a population that has zero population 


growth. For any initial age distribution, the population approaches a limiting age distribution that is some 
multiple of the eigenvector X1. From Equations 6 and 7, we see that Aj = 1 is an eigenvalue if and only if 


ay + a2by +4312 +... + ayb1b2... by) = 1 (16) 
The expression 
R=a, +agb + a30142 +... + ay... by-1 (17) 


is called the net reproduction rate of the population. (See Exercise 5 for a demographic interpretation of R.) 
Thus, we can say that a population has zero population growth if and only if its net reproduction rate is 1. 


Exercise Set 10.17 


1. Suppose that a certain animal population is divided into two age classes and has a Leslie matrix 
13 
i= 
= ip 
2 
(a) Calculate the positive eigenvalue A; of L and the corresponding eigenvector 1. 


(b) Beginning with the initial age distribution vector 
@ _ [100 
x — 
k 
calculate x), ~@, ~@, ~Q, and x ©), rounding off to the nearest integer when necessary. 


(c) Calculate x using the exact formula x — 74 and using the approximation formula x¥© ~ Nyx 


Answer: 


1 
M=3, x, = me 
3 


(b) = (1 20 =|" 20 =| x0 =| 2 =[00] 
() ,O@ s~9 1897] .O oy ~O_ |] 89 
x Pe be , x’ Aix 5 


2. Find the characteristic polynomial of a general Leslie matrix given by Equation 4. 


3. (a) Show that the positive eigenvalue A, of a Leslie matrix is always simple. Recall that a root Ag of a 
polynomial g{A) is simple if and only if g' (Ag) # 0. 


(b) Show that the eigenspace corresponding to A; has dimension 1. 
4. Show that the right side of Equation 10 is x1, where c is the first entry of the column vector P—!x©). 


5. Show that the net reproduction rate R, defined by 17, can be interpreted as the average number of 
daughters born to a single female during her expected lifetime. 


6. Show that a population is eventually decreasing if and only if its net reproduction rate is less than 1. 
Similarly, show that a population is eventually increasing if and only if its net reproduction rate is greater 
than 1. 


7. Calculate the net reproduction rate of the animal population in Example 1. 
Answer: 


2315 


8. (For readers with a hand calculator) Calculate the net reproduction rate of the Canadian female 
population in Example 4. 


Answer: 


1.49611 


9. (For readers who have read Section 10.1—Section 10.3) Prove Theorem 10.17.2. [Hint: Write ‘= re, 
substitute into 7, take the real parts of both sides, and show that r < Ay. 


Section 10.17 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be 
MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra 
software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to 
read the relevant documentation for the particular utility you are using. The goal of these exercises is to 
provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in 
these exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 


T1. Consider the sequence of Leslie matrices 


0 0 0 @ 
iz= a Z3=|41 0 0], 
by 0 
0 2 0 
000 0 a 
0.0 0 4 b, 0 0 0 0 
b, 0 0 0 
Le . Ls=[0 220 0 OF, 
0 b 0 0 
ar 0 0 a0 0 
: 0 0 0 b4 0 


(a) Use a computer to show that 
Bah, Bah, Lt=ly Le=1s,... 
for a suitable choice of a in terms of by, 9, ..., By—4. 


(b) From your results in part (a), conjecture a relationship between a and 4, 69, ..., ,—4 that will make 
£7 = 1, where 


o 
— 
Oo 
2 
0 eS eS SS 
©. Oo Oo DR 


(c) Determine an expression for Py (A) = |AZy — Z| and use it to show that all eigenvalues of Zy, satisfy 
|A| = 1 when a and }j, 43, ..., 5»—1 are related by the equation determined in part (b). 


T2. Consider the sequence of Leslie matrices 


b O 
0b O 
2 
a ap ap” ap 
Ig=|b 09 0 0 
0b 0 O 
00 A O 
2 
aap ap” ap” ap 
bo o 0 0 
Ls=|0 6b 0 0 O 
00 b 0 0 
00 0 2b 0 
a ap ap" _ ap’ ap”! 
b 0 0 0 0 
In=|0 8 0 0 0 
00 2 0 0 
00 O b 0 


whereQ<p<ldQ<dS<1, and] <g. 
(a) Choose a value for 7 (say, » — 8). For various values of a, b, and p, use a computer to determine the 
dominant eigenvalue of £,,, and then compare your results to the value of a + 2p. 


(b) Show that 


n n 
Py (A) = |, — Z,| =A" —a@ oem 


A= dp 


which means that the eigenvalues of £,, must satisfy 
Nt (ae bp aA" 4a(ap)" =0 
(c) Can you now provide a rough proof to explain the fact that Ay =a ++ bp? 


T3. Suppose that a population of mice has a Leslie matrix Z over a 1-month period and an initial age 


distribution vector x“) given by 


143 

OR Sy Ss ao 

2 Ooo Oe OB 
5 50 
02 0000 ay 
10 (a) 30 
i= 9 and x'" =| 45 
00 #0 0 0 ae 
4 5 

00 0 2 0 0 

eS 

O00 OF 0 


(a) Compute the net reproduction rate of the population. 

(b) Compute the age distribution vector after 100 months and 101 months, and show that the vector after 101 
weeks is approximately a scalar multiple of the vector after 100 months. 

(c) Compute the dominant eigenvalue of L and its corresponding eigenvector. How are they related to your 
results in part (b)? 

(d) Suppose you wish to control the mouse population by feeding it a substance that decreases its age-specific 
birthrates (the entries in the first row of L) by a constant fraction. What range of fractions would cause the 
population eventually to decrease? 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.18 Harvesting of Animal Populations 


In this section we employ the Leslie matrix model of population growth to model the sustainable harvesting 
of an animal population. We also examine the effect of harvesting different fractions of different age groups. 


Prerequisites 


Age-Specific Population Growth (Section 10.17) 


Harvesting 


In Section 10.17 we used the Leslie matrix model to examine the growth of a female population that was 
divided into discrete age classes. In this section, we investigate the effects of harvesting an animal population 
growing according to such a model. By harvesting we mean the removal of animals from the population. 
(The word harvesting is not necessarily a euphemism for “slaughtering”; the animals may be removed from 
the population for other purposes.) 


In this section we restrict ourselves to sustainable harvesting policies. By this we mean the following: 


DEFINITION 1 


A harvesting policy in which an animal population is periodically harvested is said to be sustainable 
if the yield of each harvest is the same and the age distribution of the population remaining after each 
harvest is the same. 


Thus, the animal population is not depleted by a sustainable harvesting policy; only the excess growth is 
removed. 


As in Section 10.17, we will discuss only the females of the population. If the number of males in each age 
class is equal to the number of females—a reasonable assumption for many populations—then our harvesting 
policies will also apply to the male portion of the population. 


The Harvesting Model 


Figure 10.18.1 illustrates the basic idea of the model. We begin with a population having a particular age 
distribution. It undergoes a growth period that will be described by the Leslie matrix. At the end of the growth 
period, a certain fraction of each age class is harvested in such a way that the unharvested population has the 


same age distribution as the original population. This cycle repeats after each harvest so that the yield is 
sustainable. The duration of the harvest is assumed to be short in comparison with the growth period so that 
any growth or change in the population during the harvest period can be neglected. 


Population before growth period Population after growth period 


os on Bo he Be : efertetetetet et ee 
met et et ot) etetetetet 


india dn 


Not harvested 


mrt 
rf 
etetet 


Figure 10.18.1 


Population 


harvested Harvested 


To describe this harvesting model mathematically, let 

x1 

x2 

ay 

be the age distribution vector of the population at the beginning of the growth period. Thus *; is the number 
of females in the ith class left unharvested. As in Section 10.17, we require that the duration of each age class 


be identical with the duration of the growth period. For example, if the population is harvested once a year, 
then the population is divided into 1-year age classes. 


If L is the Leslie matrix describing the growth of the population, then the vector jx is the age distribution 
vector of the population at the end of the growth period, immediately before the periodic harvest. Let ;, for 
i= 1, 2,..., x, be the fraction of females from the ith class that is harvested. We use these n numbers to form 


an » x » diagonal matrix 
ky 0 0 ... O 
O #7 0 ... 0 
H=/0 0 &3... 0 
0 0 0 ... hy 
which we will call the harvesting matrix. By definition, we have 
O0<2;<1 G=1,2,...,%) 
That is, we can harvest none (#1; = 0), all (2; = 1), or some fraction (0 < 2; < 1) of each of the n classes. 


Because the number of females in the ith class immediately before each harvest is the ith entry (£x), of the 
vector jx, the ith entry of the column vector 


yi (Ex), 
Hix= h2(ix)2 


Ry, (Ex) = 


is the number of females harvested from the ith class. 


From the definition of a sustainable harvesting policy, we have 


age distribution age distribution 
at end of — [harvest] =| at beginning of 
growth period growth period 
or, mathematically, 
ix = Aix=x (1) 


If we write Equation | in the form 
(i= H)ix=x (2) 


we see that x must be an eigenvector of the matrix (/ — 4’), corresponding to the eigen- value 1. As we will 
now show, this places certain restrictions on the values of #2; and x. 


Suppose that the Leslie matrix of the population is 


@{ @2 @3 ... &y-1 ey 
oD. 0 3. <8 0 


£=|0 27 0 sz OC 0 (3) 
OOO <2 Baar 
Then the matrix (/ — 4)Z is (verify) 
(lekyjay (l—4y)ag (l—4y)az ... (l—Ayjay-1 (L—41)ay, 


(1—h3)d, 0 0 ae 0 0 
(-H)L= 0 (1—h3)b2 0 ee 0 0 
0 0 0 a Amey ha 0 


Thus, we see that (/ — 4’)Z is a matrix with the same mathematical form as a Leslie matrix. In Section 10.17 
we showed that a necessary and sufficient condition for a Leslie matrix to have | as an eigenvalue is that its 
net reproduction rate also be 1 [see Eq. 16 of Section 10.17]. Calculating the net reproduction rate of 


(J — HE and setting it equal to 1, we obtain (verify) 
(1 — 41) [a1 +201 (1 — 42) +. 3014201 — 2) (1 — 3) +... 4) 
+ @yb122..by—1 C1 = 22) = 43)... = 4y)] = 1 


This equation places a restriction on the allowable harvesting fractions. Only those values of #21, #3, ..., By 


that satisfy 4 and that lie in the interval [0, 1] can produce a sustainable yield. 


If 21, #2, ..., 2y do satisfy 4, then the matrix (/ — #)Z has the desired eigenvalue \; = 1. Furthermore, this 
eigenvalue has multiplicity 1, because the positive eigenvalue of a Leslie matrix always has multiplicity 1 
(Theorem 10.17.1). This means that there is only one linearly independent eigenvector x satisfying Equation 
2. [See Exercise 3(b) of Section 10.17.] One possible choice for x is the following normalized eigenvector: 


1 
bi (1 — #32) 
byb2(1 — #2) (1 — 43) 
b1b2b3(1 — #2) (1 — 43) 1 — Ra) 


616263... by_1(1 = #2) (1 = 43)...1 = Ry) 


Any other solution x of 2 is a multiple of X1. Thus, the vector X, determines the proportion of females within 
each of the n classes after a harvest under a sustainable harvesting policy. But there is an ambiguity in the 
total number of females in the population after each harvest. This can be determined by some auxiliary 
condition, such as an ecological or economic constraint. For example, for a population economically 
supported by the harvester, the largest population the harvester can afford to raise between harvests would 
determine the particular constant that x; is multiplied by to produce the appropriate vector x in Equation 2. 
For a wild population, the natural habitat of the population would determine how large the total population 
could be between harvests. 


Summarizing our results so far, we see that there is a wide choice in the values of #21, #3, -.., #y, that will 
produce a sustainable yield. But once these values are selected, the proportional age distribution of the 
population after each harvest is uniquely determined by the normalized eigenvector x1 defined by Equation 5. 
We now consider a few particular harvesting strategies of this type. 


Uniform Harvesting 


With many populations it is difficult to distinguish or catch animals of specific ages. If animals are caught at 
random, we can reasonably assume that the same fraction of each age class is harvested. We therefore set 


kh=hy =h2=...=hy 
Equation 2 then reduces to (verify) 
1 
ix= 
x ( 1 kx 
Hence, 1 / (1 —#} must be the unique positive eigenvalue A, of the Leslie growth matrix L. That is, 
1 
A= 
Th 


Solving for the harvesting fraction h, we obtain 
kh=1=—(1/A4) (6) 


The vector X1, in this case, is the same as the eigenvector of L corresponding to the eigenvalue Ay. From 


Equation 8 of Section 10.17, this is 
1 
by fay 
Biba! 


xj = 
bibab3/ AP 


(7) 


bybz. by fat! 


From 6 we can see that the larger 4, is, the larger is the fraction of animals we can harvest without depleting 
the population. Note that we need \; > 1 in order for the harvesting fraction h to lie in the interval (0, 1). 
This is to be expected, because A; > 1 is the condition that the population be increasing. 


EXAMPLE 1 Harvesting Sheep <@ 


For a certain species of domestic sheep in New Zealand with a growth period of | year, the 
following Leslie matrix was found (see G. Caughley, “Parameters for Seasonally Breeding 
Populations,” Ecology, 48, 1967, pp. 834-839). 


O00 045 .391 472 484 546 543 502 468 459 433 421 
845 0 0 0 0 0 0 0 0 0 0 0 


GO. .2i5, O 0 0 0 0 0 0 0 0 0 
0 0 965 0 0 0 0 0 0 0 0 0 
0 0 0 6.950 60 0 0 0 0 0 0 0 
f= 0 0 0 0 926 0 0 0 0 0 0 0 
0 0 0 0 0 895 0 0 0 0 0 0 
0 0 i) 0 0 0 850 60 0 0 0 0 
0 0 0 0 0 0 O 786 0 0 0 0 
0 0 0 ) 0 0 0 0 6.691 «(0 0 0 
0 0 0 0 0 0 0 0 0 6.561 0 0 
0 0 0 0 0 0 0 i) 0 0 370 60 


The sheep have a lifespan of 12 years, so they are divided into 12 age classes of duration | year 
each. By the use of numerical techniques, the unique positive eigenvalue of L can be found to 
be 


Ay = 1.176 
From Equation 6, the harvesting fraction h is 
h=1=(1/Ay)=1=—(1/ 1.176) =.150 


Thus, the uniform harvesting policy is one in which 15.0 % of the sheep from each of the 12 
age classes is harvested every year. From 7 the age distribution vector of the sheep after each 
harvest is proportional to 


1.000 
0.719 
0.596 
0.439 
0.395 
0.311 
0.237 ie) 
0.171 
0.114 
0.067 
0.032 
0.010 


xj, = 


From 8 we see that for every 1000 sheep between 0 and | year of age that are not harvested, 
there are 719 sheep between | and 2 years of age, 596 sheep between 2 and 3 years of age, and 
so forth. 


Harvesting Only the Youngest Age Class 


In some populations only the youngest females are of any economic value, so the harvester seeks to harvest 
only the females from the youngest age class. Accordingly, let us set 


4) = hk 
Ag = &£3=...=A4,=0 
Equation 4 then reduces to 
(1=—) (ay + a2by + 34159 +... + ayb142..4y-1) = 1 
or 
(1=24)R=1 


where R is the net reproduction rate of the population. [See Equation 17 of Section 10.17.] Solving for h, we 
obtain 


=1=(1/2) (9) 


Note from this equation that a sustainable harvesting policy is possible only if 8 = 1. This is reasonable 
because only if 2 = ] is the population increasing. From Equation 5, the age distribution vector after each 
harvest is proportional to the vector 


by 
oe b1b2 
b1b3b3 
51673..by-1 
EXAMPLE 2 Sustainable Harvesting Policy 
Let us apply this type of sustainable harvesting policy to the sheep population in Example 1. 
For the net reproduction rate of the population we find 
R = a, +a2b, +430152 +...+ ayb142..by_-] 
= (.000) + (.045)(.845) +... + (421) (845) (.975)...6.370) 
2.514 
From Equation 9, the fraction of the first age class harvested is 
kh=1=(1/8)=1=(1/2.514) = .602 
From Equation 10, the age distribution of the sheep population after the harvest is proportional 
to the vector 
1.000 
1.000 0.845 
os | | 
845) 0.975 
( ¢ 0.755 
_— (845) (.975) (965) _ 10.699 
t= = (11) 
. 0.626 
0.532 
0.418 
(.845) (.975)...¢.370) 0.289 
0.162 
0.060 


A direct calculation gives us the following (see also Exercise 3): 


(10) 


2.514 
0.845 
0.824 
0.795 
0.755 

Ix, =| 0-699 a2) 
0.626 
0.532 
0.418 
0.289 
0.162 
0.060 


The vector x; is the age distribution vector immediately before the harvest. The total of all 
entries in £x, is 8.520, so the first entry 2.514 is 29.5% of the total. This means that 
immediately before each harvest, 29.5% of the population is in the youngest age class. Since 
60.2% of this class is harvested, it follows that 17.8% (= 60.2% of 29.5%) of the entire sheep 
population is harvested each year. This can be compared with the uniform harvesting policy of 
Example 1, in which 15.0% of the sheep population is harvested each year. 


Optimal Sustainable Yield 


We saw in Example | that a sustainable harvesting policy in which the same fraction of each age class is 
harvested produces a yield of 15.0 % of the sheep population. In Example 2 we saw that if only the youngest 
age class is harvested, the resulting yield is 17.8 % of the population. There are many other possible 
sustainable harvesting policies, and each generally provides a different yield. It would be of interest to find a 
sustainable harvesting policy that produces the largest possible yield. Such a policy is called an optimal 
sustainable harvesting policy, and the resulting yield is called the optimal sustainable yield. However, 
determining the optimal sustainable yield requires linear programming theory, which we will not discuss here. 
We refer you to the following result, which appears in J. R. Beddington and D. B. Taylor, “Optimum Age 
Specific Harvesting of a Population,” Biometrics, 29, 1973, pp. 801-809. 


THEOREM 10.18.1 Optimal Sustainable Yield 


An optimal sustainable harvesting policy is one in which either one or two age classes are harvested. 
If two age classes are harvested, then the older age class is completely harvested. 


As an illustration, it can be shown that the optimal sustainable yield of the sheep population is attained when 


hy = 0.522 a 
ho = 1.000 a 


and all other values of #; are zero. Thus, 52.2 % of the sheep between 0 and | year of age and all the sheep 
between 8 and 9 years of age are harvested. As we ask you to show in Exercise 2, the resulting optimal 
sustainable yield is 19.9 % of the population. 


Exercise Set 10.18 


1. Let a certain animal population be divided into three 1-year age classes and have as its Leslie matrix 


0 4 3 
1 oo 
a be 
I 
0 i 0 


(a) Find the yield and the age distribution vector after each harvest if the same fraction of each of the 
three age classes is harvested every year. 


(b) Find the yield and the age distribution vector after each harvest if only the youngest age class is 
harvested every year. Also, find the fraction of the youngest age class that is harvested. 


Answer: 
(a) 1 
1 i. 
Yield = 333% of population; xj =| 3 
2% 
18 
(b) 1 
1 
Vield — 45.2% of population; xj =| 2 |; harvest 57.9% of youngest age class 
a 
8 


2. For the optimal sustainable harvesting policy described by Equations 13, find the vector Xj that specifies 
the age distribution of the population after each harvest. Also calculate the vector ix, and verify that the 
optimal sustainable yield is 19.9 % of the population. 


Answer: 


845 B45 
824 B24 
795 795 
755 755 
_| 699 _| 699] 10904.418 _ 
1=! 66 A=] 606 | 750 rr? 
532 532 
0 418 
0 0 
0 0 
0 0 


3. Use Equation 10 to show that if only the first age class of an animal population is harvested 
R=-1 
0 
Ex;-x1=| 0 
0 
where R is the net reproduction rate of the population. 
4. If only the ith class of an animal population is to be periodically harvested (/ = 1, 2, ..., #), find the 


corresponding harvesting fraction }2;. 


Answer: 


hp=(R=-1) f (aybyba: + by tt + +ayb122° + + By-1) 


5. Suppose that all of the Jth class and a certain fraction #; of the /th class of an animal population is to be 
periodically harvested (1 </-< J <»)}. Calculate #;. 


Answer: 
_ _ai+ apbyee ts + (ay_poybo- + + by) —1 
apojbo- + bpp ts + +ay-1b1b2° + + bs-2 


Section 10.18 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the 
relevant documentation for the particular utility you are using. The goal of these exercises is to provide you 
with a basic proficiency with your technology utility. Once you have mastered the techniques in these 
exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 


T1. The results of Theorem 10.18.1 suggest the following algorithm for determining the optimal sustainable 
yield. 


1. For each value of? = 1, 2, ..., #, set ky = # and 2; = 0 for & +; and calculate the respective yields. These 
n calculations give the one-age-class results. Of course, any calculation leading to a value of / not between 
0 and 1 is rejected. 


2. For each value of i= 1, 2,....2— 1 and j=i+ 1,i+ 2,...,”, seth; =k, kj =1, and hj, =0 fork #3, 
j and calculate the respective yields. These 3m (2 — 1) calculations give the two-age-class results. Of 
course, any calculation leading to a value of h not between 0 and | is again rejected. 

3. Of the yields calculated in parts (i) and (ii), the largest is the optimal sustainable yield. Note that there will 
be at most 


a ee et 
n+ on(n I= 52+ 1) 


calculations in all. Once again, some of these may lead to a value of / not between 0 and | and must 
therefore be rejected. 


If we use this algorithm for the sheep example in the text, there will be at most 5 (12) (12+1)=78 


calculations to consider. Use a computer to do the two-age-class calculations for ky =k, #; = 1, and ky, =0 
fork « 1 or j for j = 2, 3,.... 12. Construct a summary table consisting of the values of #2, and the 
percentage yields using j = 2, 3, ..., 12, which will show that the largest of these yields occurs when j = 9. 


T2. Using the algorithm in Exercise T1 , do the one-age-class calculations for 2; = # and ky; = 0 for ¢ ¢; for 
i= 1, 2,..., 12. Construct a summary table consisting of the values of #; and the percentage yields using 
i= 1, 2,..., 12, which will show that the largest of these yields occurs when ; — 9. 

T3. Referring to the mouse population in Exercise T3 of Section 10.17, suppose that reducing the birthrates 
is not practical, so you instead decide to control the population by uniformly harvesting all of the age classes 
monthly. 


(a) What fraction of the population must be harvested monthly to bring the mouse population to equilibrium 
eventually? 


(b) What is the equilibrium age distribution vector under this uniform harvesting policy? 


(c) The total number of mice in the original mouse population was 155. What would be the total number of 
mice after 5, 10, and 200 months under your uniform harvesting policy? 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.19 ALeast Squares Model for Human Hearing 


In this section we apply the method of least squares approximation to a model for human hearing. The use of this 
method is motivated by energy considerations. 


Prerequisites 


Inner Product Spaces 
Orthogonal Projection 


Fourier Series (Section 6.6) 


Anatomy of the Ear 


We begin with a brief discussion of the nature of sound and human hearing. Figure 10.19.1 is a schematic diagram 
of the ear showing its three main components: the outer ear, middle ear, and inner ear. Sound waves enter the outer 
ear where they are channeled to the eardrum, causing it to vibrate. Three tiny bones in the middle ear mechanically 
link the eardrum with the snail-shaped cochlea within the inner ear. These bones pass on the vibrations of the 
eardrum to a fluid within the cochlea. The cochlea contains thousands of minute hairs that oscillate with the fluid. 
Those near the entrance of the cochlea are stimulated by high frequencies, and those near the tip are stimulated by 
low frequencies. The movements of these hairs activate nerve cells that send signals along various neural pathways 
to the brain, where the signals are interpreted as sound. 


aX ite. , Cochlea 
; ; Sh 2 Auditory 
| \ Hh “a: . . nerve 
Le \ -Dial>N 
: z : 
., * 
} xy 


Middle 


ear 


Sound 
wave 


Inner ‘\ 


car 


Figure 10.19.1 


The sound waves themselves are variations in time of the air pressure. For the auditory system, the most 
elementary type of sound wave is a sinusoidal variation in the air pressure. This type of sound wave stimulates the 
hairs within the cochlea in such a way that a nerve impulse along a single neural pathway is produced (Figure 
10.19.2). A sinusoidal sound wave can be described by a function of time 


g(t) = Aj + A sin(wt —d) (1) 


where ¢({£) is the atmospheric pressure at the eardrum, Ag is the normal atmospheric pres-sure, A is the maximum 
deviation of the pressure from the normal atmospheric pressure, , / 2¢ is the frequency of the wave in cycles per 
second, and § is the phase angle of the wave. To be perceived as sound, such sinusoidal waves must have 
frequencies within a certain range. For humans this range is roughly 20 cycles per second (cps) to 20,000 cps. 
Frequencies outside this range will not stimulate the hairs within the cochlea enough to produce nerve signals. 


—= 


Neural pathways 
Far to brain 


qt) 


Figure 10.19.2 


To a reasonable degree of accuracy, the ear is a linear system. This means that if a complex sound wave is a finite 
sum of sinusoidal components of different amplitudes, frequencies, and phase angles, say, 


g(t) = Ag + Ay sin(wyt — 41) + Az sin(w2t — db) +... + Ay sin(wyt — dy) (2) 


then the response of the ear consists of nerve impulses along the same neural pathways that would be stimulated by 
the individual components (Figure 10.19.3). 


fe REY 


Figure 10.19.3 


Let us now consider some periodic sound wave p(t} with period T [i.e., p(t) = p(t + T)] that is nota finite sum 
of sinusoidal waves. If we examine the response of the ear to such a periodic wave, we find that it is the same as 
the response to some wave that is the sum of sinusoidal waves. That is, there is some sound wave g(£) as given by 
Equation 2 that produces the same response as p{£), even though p(¢) and g({£) are different functions of time. 


We now want to determine the frequencies, amplitudes, and phase angles of the sinusoidal components of g(£). 
Because g(£} produces the same response as the periodic wave p(#), it is reasonable to expect that g(£) has the 
same period T as p(£}. This requires that each sinusoidal term in g(f} have period T. Consequently, the frequencies 


of the sinusoidal components must be integer multiples of the basic frequency | / 7 of the function p(é}. Thus, the 
wy, in Equation 2 must be of the form 


wy = 2kn/ T, k= 1, 2,... 


But because the ear cannot perceive sinusoidal waves with frequencies greater than 20,000 cps, we may omit those 
values of k for which w;, / 27 = k / T is greater than 20,000. Thus, g(z) is of the form 


. ft Qatt : . f Quart 2 
g(t) = Ap + Ay sin( 2H — 5, \ 4.4 Ay sin{ 28 — 5, @) 
where n is the largest integer such that », / F is not greater than 20,000. 


We now turn our attention to the values of the amplitudes Ag, Aj, ..., A, and the phase angles 41, 49, ..., dy, that 
appear in Equation 3. There is some criterion by which the auditory system “picks” these values so that ¢(#)} 
produces the same response as »(#}. To examine this criterion, let us set 


e(t) = p(t) —¢) 
If we consider g{£) as an approximation to p(¢)}, then ¢(¢) is the error in this approximation, an error that the ear 


cannot perceive. In terms of e(£), the criterion for the determination of the amplitudes and the phase angles is that 
the quantity 


T T 
[ (ey ar= | [p(t) a(t) |? det (4) 
0 0 


be as small as possible. We cannot go into the physiological reasons for this, but we note that this expression is 
proportional to the acoustic energy of the error wave ¢(£} over one period. In other words, it is the energy of the 
difference between the two sound waves p(t) and g(£) that determines whether the ear perceives any difference 
between them. If this energy is as small as possible, then the two waves produce the same sensation of sound. 
Mathematically, the function g(#) in 4 is the least squares approximation to p(#) from the vector space C’'[0, 7] of 
continuous functions on the interval [0, 7°]. (See Section 6.6.) 


Least squares approximations by continuous functions arise in a wide variety of engineering and scientific 
approximation problems. Apart from the acoustics problem just discussed, some other examples follow. 


1. Let S(x} be the axial strain distribution in a uniform rod lying along the x-axis from x = 0 to x =} (Figure 
10.19.4). The strain energy in the rod is proportional to the integral 


i 
[ [S(x) ]? dx 
0 


The closeness of an approximation g(x} to S(x)} can be judged according to the strain energy of the difference 
of the two strain distributions. That energy is proportional to 


; 2 
[ [S(x) —9 (x) ]? dx 


which is a least squares criterion. 


2. Let #(#) be a periodic voltage across a resistor in an electrical circuit (Figure 10.19.5). The electrical energy 
transferred to the resistor during one period T is proportional to 


ve 
[ [E(t)]? de 
0 


If g({£) has the same period as #(£) and is to be an approximation to #(£), then the criterion of closeness might 
be taken as the energy of the difference voltage. This is proportional to 


. 2 
[ [B() —g(e)]? at 


which is again a least squares criterion. 


3. Let yx) be the vertical displacement of a uniform flexible string whose equilibrium position is along the x-axis 
from x = 0) to x =} (Figure 10.19.6). The elastic potential energy of the string is proportional to 


: 2 
[ Ly (x) |? dx 


If g(x} is to be an approximation to the displacement, then as before, the energy integral 


i 
[ Ly(x) <9(x) 2 dx 


determines a least squares criterion for the closeness of the approximation. 


Stx) 
axial strain 


r=0 r=l 
Figure 10.19.4 


Ei) 
voltage 


Figure 10.19.5 


yx) 
displacement 


Figure 10.19.6 


Least squares approximation is also used in situations where there is no a priori justification for its use, such as for 
approximating business cycles, population growth curves, sales curves, and so forth. It is used in these cases 
because of its mathematical simplicity. In general, if no other error criterion is immediately apparent for an 
approximation problem, the least squares criterion is the one most often chosen. 


The following result was obtained in Section 6.6. 


THEOREM 10.19.1 Minimizing Mean Square Error on [0, 2t1r] 
If # (£) is continuous on [0, 27], then the trigonometric function g(£#) of the form 
g(t) = 5a +a, COSE+...+ ay cos mE + by sn f+ ...+ by sin we 
that minimizes the mean square error 
27 
2 
[ v@-eorae 


has coefficients 


20 
ap | f(t)cos kt dt, K=0,1,2,...2 
a Oo 


2 
by i/ f(t)sin kt dt, k=1,2,...7 
a o 


If the original function f (£} is defined over the interval [0, 7] instead of [0, 2a], a change of scale will yield the 
following result (see Exercise 8): 


THEOREM 10.19.2 Minimizing Mean Square Error on [0, T] 


If # (£) is continuous on [0, 7], then the trigonometric function g(¢) of the form 


=e: | on out on . ona 
g(t) = a0 + ay cost -...4 dy COS t+ ob, sin t...4 by sin t 
that minimizes the mean square error 
T 
[ vo-eora 
has coefficients 
T 
aes 2kmt = 
a, = af fF (é)cos f dt, k*=0,1,2,..,% 
2" ery Dont 
i = 2 f(é)sin= a c= 1,248 


EXAMPLE 1 Least Squares Approximation toa Sound Wave 


Let a sound wave p(£) have a saw-tooth pattern with a basic frequency of 5000 cps (Figure 10.19.7). 
Assume units are chosen so that the normal atmospheric pressure is at the zero level and the 
maximum amplitude of the wave is A. The basic period of the wave is 7 = ] / 5000 = .0002 second. 
From ¢ = 0 to ¢ = 7, the function p(¢} has the equation 


p= 25-2) 


Theorem 10.19.2 then yields the following (verify): 


T ir 
ee a2, 2Ast _ 
a a p(t) de= 3 | ae tae 0 
T Tv 
2 | Qhomt 2 f° 24/7 Qiomt 
a = p(tdeor Ext ae = 3 | TBF —# Jeo Att at =0, k=1,2,... 
FJo T gg fT \2 F 
T i 
= 2 _okut », 2 f 2A(T_,\.. 2knt »,_ 2A 4, _ 
bh = 2/ p(éjsin = dt al (3 t}sin 7 dt a k=1,2,... 


We can now investigate how the sound wave p(#)} is perceived by the human ear. We note that 
4 / T= 20,000 cps, so we need only go up to = 4 in the formulas above. The least squares 
approximation to p(£) is then 


— 2A) 20, 1 fe, 1 oe, 1 Ba 
g(t) = = sine + p sins t + zane + aad 
The four sinusoidal terms have frequencies of 5000, 10,000, 15,000, and 20,000 cps, respectively. In 
Figure 10.19.8 we have plotted »(£) and g{#) over one period. Although g(#) is not a very good 
point-by-point approximation to p(£)}, to the ear, both p(£} and g(£) produce the same sensation of 


sound. 


Figure 10.19.7 


T=.0002 t 


Figure 10.19.8 


As discussed in Section 6.6, the least squares approximation becomes better as the number of terms in the 
approximating trigonometric polynomial becomes larger. More precisely, 


20 n 2 
[ uc —ta5— 3D (ag cos kt + by sin X)| dt 
0 2 k=1 
tends to zero as n approaches infinity. We denote this by writing 


f(r 540 + 3 (a; cos kt +b; sin kt) 
k=1 


where the right side of this equation is the Fourier series of  (£}. Whether the Fourier series of 7 (£) converges to 
J (£) for each ¢ is another question, and a more difficult one. For most continuous functions encountered in 
applications, the Fourier series does indeed converge to its corresponding function for each value of t. 


Exercise Set 10.19 


1. Find the trigonometric polynomial of order 3 that is the least squares approximation to the function 
f@H=t- x)? over the interval [0, 27]. 


Answer: 


x 4 
— +4 cost + cos 2f + —cos 3¢t 


3 3 
2. Find the trigonometric polynomial of order 4 that is the least squares approximation to the function # (f) = t? 


over the interval [0, 7]. 


Answer: 


oy 2 on ge Z r 


a 


7 | Qn,, 1. .4m,, 1... 6,, 1 Sse] 


z 2 ‘i 3 Lr 4 T 


3. Find the trigonometric polynomial of order 4 that is the least squares approximation to the function 7 (£} over 


the interval [0, 20], where 
smné, O<t<ia 
f= or 


— (sin et 5 sin -+ dV gin Sy 4 4 sn “Be 


0, Taf on 
Answer: 
4 t 5 sint — 2 cos 2t— 2— cos dt 
4. Find the trigonometric polynomial of arbitrary order n that is the least squares approximation to the function 
o= sing? over the interval [0, 27]. 
Answer: 
4/1 1 1 1 1 
oo - > ——— cos 3f=—...= 5 
x \D Te aki Zs 008 at Is Bs (n= 1)Qa+1) cos nt 


5. Find the trigonometric polynomial of arbitrary order n that is the least squares approximation to the function 
J (£) over the interval [0, 7], where 


t, O<t< st 
#®)= ae 
—, ai < i<7T 
Answer 
T 8F/ 1 oat 1 bat 1 10mé 1 nm 
— — ——|-—s cos SS te cos SF COs | os 
4 x2 2 Tv 62 i 2 Tr (2n)? i 


6. For the inner product 
(u, vj =| “u(t)v(t) dt 
QO 


show that 
(a) {Jl = 2m 
(b) ||cos ke|| = yx fork=1,2,... 
(c) ||sin kt|| = yr fork=1, 2,... 
7. Show that the 2», +4. 1 functions 
1, cos £, cos 2f, ..., cos mé, sin é, sin 2é, ..., sin wt 
are orthogonal over the interval [0, 2m] relative to the inner product {u, v} defined in Exercise 6. 


8. If # (£} is defined and continuous on the interval [0, 7], show that jf (77 / 2m) is defined and continuous for + 
in the interval [0, 2m]. Use this fact to show how Theorem 10.19.2 follows from Theorem 10.19.1. 


Section 10.19 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, 
Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a 
scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic 
proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be 
able to use your technology utility to solve many of the problems in the regular exercise sets. 


T1. Let g be the function 


_ 344 sine 
att) = 5—4cosé 


for 0 < ¢ < 27. Use a computer to determine the Fourier coefficients 


ak ei (Boba in cos kt A 
by Tfo 45 —4 cose J) sin kt 


for k= 0, 1, 2, 3,4, 5. From your results, make a conjecture about the general expressions for @j, and ;. Test your 
conjecture by calculating 


2 
on the computer and see whether it converges to g(#). 


T2. Let g be the function 


day +t >> (ay cos kt + d;, sin kt) 
k=1 


g(t) =e°"8 *[cos(sin £) + sin(sin £)] 


for 0 < ¢ < 27. Use a computer to determine the Fourier coefficients 


ai 1 il cos kt 
= f : at 
fon a a Non 
for k= 0, 1, 2, 3,4, 5. From your results, make a conjecture about the general expressions for @% and 5;. Test your 


conjecture by calculating 


540 + 3° (a; cos kt + d, sin kt) 
k=1 


on the computer and see whether it converges to g(t). 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


10.20 Warps and Morphs 


Among the more interesting image-manipulation techniques available for computer graphics are warps and 
morphs. In this section we show how linear transformations can be used to distort a single picture to produce 
a warp, or to distort and blend two pictures to produce a morph. 


Prerequisites 


Geometry of Linear Operators on 22 (Section 4.11) 


Linear Independence 


Bases in 22 


Computer graphics software enables you to manipulate an image in various ways, such as by scaling, rotating, 
or slanting the image. Distorting an image by separately moving the corners of a rectangle containing the 
image is another basic image-manipulation technique. Distorting various pieces of an image in different ways 
is amore complicated procedure that results in a warp of the picture. In addition, warping two different 
images in complementary ways and blending the warps results in a morph of the two pictures (from the Greek 
root meaning “shape” or “‘form’’). An example is Figure 10.20.1 in which four photographs of a woman taken 
over a 50-year period (the four diagonal pictures from top left to bottom right) have been pairwise morphed 
by different amounts to suggest the gradual aging of the woman. 


Figure 10.20.1 


The most visible application of warping and morphing images has been the production of special effects in 
motion pictures and television. However, many scientific and technological applications of such techniques 
have also arisen—for example, studying the evolution, growth, and development of living organisms, 
assisting in reconstructive and cosmetic surgery, exploring various designs of a product, and “aging” 
photographs of missing persons or police suspects. 


Warps 
We begin by describing a simple warp of a triangular region in the plane. Let the three vertices of a triangle be 
given by the three noncollinear points ¥1, ¥2, and V3 (Figure 10.20.2a). We will call this triangle the begin- 


triangle. If v is any point in the begin-triangle, then there are unique constants ¢, and ¢2 such that 


¥—v3=¢1(¥1 — V3) +¢2(¥2 — V3) (1) 


Equation | expresses the vector ¥ — ¥3 as a (unique) linear combination of the two linearly independent 
vectors V1 — ¥3 and ¥2 — V3 with respect to an origin at ¥3. If we set ez = 1 cy —¢3, then we can rewrite | 
as 


¥=C1V] + C2¥2 + €3V3 (2) 
where 
ey+¢24+¢3=1 (3) 
from the definition of ¢3. We say that v is a convex combination of the vectors ¥1, ¥2, and ¥3 if 2 and 3 are 
satisfied and, in addition, the coefficients C1, ¢2, and ¢3 are nonnegative. It can be shown (Exercise 6) that v 


lies in the triangle determined by Vj, ¥2, and V3 if and only if it is a convex combination of those three 
vectors. 


VS CyVy + CaVa + CgVy 


(a) 


w=c iW) +¢ 9W> + CW, 


(5) 


Figure 10.20.2 


Next, given three noncollinear points W1, 2, and W3 of an end-triangle (Figure 10.20.25), there is a unique 
affine transformation that maps ¥1 to W1, ¥2 to W2, and ¥3 to W3. That is, there is a unique 2 x 2 invertible 
matrix M and a unique vector b such that 


w,;= Mv,;+b for?=1, 2,3 (4) 


(See Exercise 5 for the evaluation of M and b.) Moreover, it can be shown (Exercise 3) that the image w of the 
vector v in 2 under this affine transformation is 


WS Cywy + Cow? + CIWS (5) 


This is a basic property of affine transformations: They map a convex combination of vectors to the same 
convex combination of the images of the vectors. 


Now suppose that the begin-triangle contains a picture within it (Figure 10.20.3a). That is, to each point in the 
begin-triangle we assign a gray level, say 0 for white and 100 for black, with any other gray level lying 
between 0 and 100. In particular, let a scalar-valued function pg, called the picture-density of the begin- 
triangle, be defined so that pg (wv) is the gray level at the point v in the begin-triangle. We can now define a 
picture in the end-triangle, called a warp of the original picture, with a picture-density », by defining the gray 
level at the point w within the end-triangle to be the gray level of the point v in the begin-triangle that maps 
onto w. In equation form, the picture-density 1 is determined by 


01 (W) = pole v1 + c2¥2 + c3Vv3) (6) 


In this way, as C1, ©2, and ¢3 vary over all nonnegative values that add to one, 5 generates all points w in the 
end-triangle, and 6 generates the gray levels p; (w) of the warped picture at those points (Figure 10.20.35). 


V=cyv, + a¥o + C35 


(a) 


WwW; 


PCW) = Pol¥) 


W = c)W, + coW> + C{W3 


(b) 
Figure 10.20.3 


Equation 6 determines a very simple warp of a picture within a single triangle. More generally, we can break 
up a picture into many triangular regions and warp each triangular region differently. This gives us much 
freedom in designing a warp through our choice of triangular regions and how we change them. To this end, 
suppose we are given a picture contained within some rectangular region of the plane. We choose n points ¥1, 


V2, ---, Vy Within the rectangle, which we call vertex points, so that they fall on key elements or features of 
the picture we wish to warp (Figure 10.20.4a). Once the vertex points are chosen, we complete a 
triangulation of the rectangular region; that is, we draw line segments between the vertex points in such a 
way that we have the following conditions (Figure 10.20.45): 


1. The line segments form the sides of a set of triangles. 

2. The line segments do not intersect. 

3. Each vertex point is the vertex of at least one triangle. 

4. The union of the triangles is the rectangle. 

5. The set of triangles is maximal (i.e., no more vertices can be connected). 


Note that condition 4 requires that each corner of the rectangle containing the picture be a vertex point. 


y\ V2 
e 
e v; 
“49 eV, 
e . 
V6 v7 
(a) 
v 1 V> 
a. 
v 1 
Ve v5 
(b) 
vy) V2 
a SE ng 
V4 
V6 Va 
(c) 
Figure 10.20.4 


One can always form a triangulation from any n vertex points, but the triangulation is not necessarily unique. 


For example, Figures 10.20.46 and 10.20.4c are two different triangulations of the set of vertex points in 
Figure 10.20.4a. Since there are various computer algorithms that perform triangulations very quickly, it is 
not necessary to perform the tiresome triangulation task by hand; one need only specify the desired vertex 
points and let a computer generate a triangulation from them. If is the number of vertex points chosen, it can 
be shown that the number of triangles m of any triangulation of those points is given by 


m=2n—-2—k (7) 


where k is the number of vertex points lying on the boundary of the rectangle, including the four situated at 
the corner points. 


The warp is specified by moving the n vertex points ¥1, v3, ..., Vy, to new locations W1, w3, .... Wy, according 
to the changes we desire in the picture (Figures 10.20.5a and 10.20.5b). However, we impose two restrictions 
on the movements of the vertex points: 


1. The four vertex points at the corners of the rectangle are to remain fixed, and any vertex point on a side of 
the rectangle is to remain fixed or move to another point on the same side of the rectangle. All other vertex 
points are to remain in the interior of the rectangle. 


2. The triangles determined by the triangulation are not to overlap after their vertices have been moved. 


The first restriction guarantees that the rectangular shape of the begin-picture is preserved. The second 
restriction guarantees that the displaced vertex points still form a triangulation of the rectangle and that the 
new triangulation is similar to the original one. For example, Figure 10.20.5c is not an allowable movement 
of the vertex points shown in Figure 10.20.5a. Although a violation of this condition can be handled 
mathematically without too much additional effort, the resulting warps usually produce unnatural results and 
we will not consider them here. 


Ve V2 We w- Wy ws 
(a) (b) (c) 
Figure 10.20.5 


Figure 10.20.6 is a warp of a photograph of a woman using a triangulation with 94 vertex points and 179 
triangles. Note that the vertex points in the begin-triangulation are chosen to lie along key features of the 
picture (hairline, eyes, lips, etc.). These vertex points were moved to final positions corresponding to those 
same features in a picture of the woman taken 20 years after the begin-picture. Thus, the warped picture 
represents the woman forced into her older shape but using her younger gray levels. 


Begin-picture 


PESW hee 


Begin-tniangulation Warped triangulation 


Figure 10.20.6 


Time-Varying Warps 


A time-varying warp is the set of warps generated when the vertex points of the begin-picture are moved 
continually in time from their original positions to specified final positions. This gives us a motion picture in 
which the begin-picture is continually warped to a final warp. Let us choose time units so that ¢ = 0 
corresponds to our begin-picture and ¢ = | corresponds to our final warp. The simplest way of moving the 
vertex points from time 0 to time | is with constant velocity along straight-line paths from their initial 


positions to their final positions. 


To describe such a motion, let uj(t) denote the position of the ith vertex point at any time ¢ between 0 and 1. 
Thus uj(0}) = vj (its given position in the begin-picture) and uj{ 1) = wj (its given position in the final warp). 
In between, we determine its position by 


u(t) = (1 —t)vj + twj (8) 


Note that 8 expresses uj(t) as a convex combination of ¥4 and Wj for each ¢ in [0, 1]. Figure 10.20.7 
illustrates a time-varying triangulation of a plain rectangular region with six vertex points. The lines 
connecting the vertex points at the different times are the space-time paths of these vertex points in this 
space-time diagram. 


\ 
VE 


Figure 10.20.7 


Once the positions of the vertex points are computed at time f, a warp is performed between the begin-picture 
and the triangulation at time t determined by the displaced vertex points at that time. Figure 10.20.8 shows a 
time-varying warp at five values of t generated from the warp between ¢ — 0) and ¢ = |] shown in Figure 
10.20.6. 


Figure 10.20.8 


Morphs 


A time-varying morph can be described as a blending of two time-varying warps of two different pictures 
using two triangulations that match corresponding features in the two pictures. One of the two pictures is 
designated as the begin-picture and the other as the end-picture. First, a time-varying warp from ¢ = 0 to 

~ = 1 is generated in which the begin-picture is warped into the shape of the end-picture. Then a time-varying 
warp from ¢ = | to ¢ = 0 is generated in which the end-picture is warped into the shape of the begin-picture. 
Finally, a weighted average of the gray levels of the two warps at each time ¢ is produced to generate the 
morph of the two images at time f. 


Figure 10.20.9 shows two photographs of a woman taken 20 years apart. Below the pictures are two 
corresponding triangulations in which corresponding features of the two photographs are matched. The 
time-varying morph between these two pictures for five values of t between 0 and | is shown in Figure 
10.20.10. 


{ = 
a 
HPA 
JENS 
Begin-triangulation End-triangulation 


Figure 10.20.9 


Figure 10.20.10 


The procedure for producing such a morph is outlined in the following nine steps (Figure 10.20.11): 


Step 1 Given a begin-picture with picture-density gg and an end-picture with picture-density 1, position n 
vertex points V1, ¥3, ..., ¥}, in the begin-picture at key features of that picture. 


Step 2 Position n corresponding vertex points W1, W32, ..., Wy, in the end-picture at the corresponding key 
features of that picture. 


Step 3 Triangulate the begin- and end-pictures in similar ways by drawing lines between corresponding 
vertex points in both pictures. 


Step 4 For any time t between 0 and 1, find the vertex points uj (#), uz(#), ..., U»,(£) in the morph picture at 
that time, using the formula 


u;(£) = (1 —£)v; + fw, i= 1,2,...4 (9) 


Step 5 Triangulate the morph picture at time ¢ similar to the begin- and end-picture triangulations. 


Step 6 For any point u in the morph picture at time ¢, find the triangle in the triangulation of the morph 
picture in which it lies and the vertices uj(#), u;(£), and ug-(¢) of that triangle. (See Exercise | to 
determine whether a given point lies in a given triangle.) 


Step 7 Express u as a convex combination of uj(#), u7(£), and ug-() by finding the constants cj, ¢ 7, and 
cx such that 


u=cyuj(t) +e yus(t) +cxug(s) (10) 
and 
crtesy+tecep=l (11) 
Step 8 Determine the locations of the point u in the begin- and end-pictures using 
Vecwrtcyvy+ervye = (inthe begin-picture) (12) 
and 
w=cpwy+eywyrteorpwe (in the end-picture) (13) 
Step 9 Finally, determine the picture-density p(u) of the morph-picture at the point u using 
ps(u) = (1 —£) pp (v) + to) Cw) (14) 


Step 9 is the key step in distinguishing a warp from a morph. Equation 14 takes weighted averages of the gray 
levels of the begin- and end-pictures to produce the gray levels of the morph-picture. The weights depend on 
the fraction of the distances that the vertex points have moved from their beginning positions to their ending 
positions. For example, if the vertex points have moved one-fourth of the way to their destinations (i.e., if 

¢ = 0.25), then we use one-fourth of the gray levels of the end-picture and three-fourths of the gray levels of 


the begin-picture. Thus, as time progresses, not only does the shape of the begin-picture gradually change into 
the shape of the end-picture (as in a warp) but the gray levels of the begin-picture also gradually change into 
the gray levels of the end-picture. 


Time = | 
End-picture 
Given density: p,(w) 


Time =t 

Morph-picture 

Computed density: 

pd) = (1 — Opplv) + tpyw) 


Time = 0 
Begin-picture 
Given density: ppv) 


l 
iy 


Figure 10.20.11 


The procedure described above to generate a morph is cumbersome to perform by hand, but it is the kind of 
dull, repetitive procedure at which computers excel. A successful morph demands good preparation and 
requires more artistic ability than mathematical ability. (The software designer is required to have the 
mathematical ability.) The two photographs to be morphed should be carefully chosen so that they have 
matching features, and the vertex points in the two photographs also should be carefully chosen so that the 
triangles in the two resulting triangulations contain similar features of the two pictures. When the procedure is 
done correctly, each frame of the morph should look just as “real” as the begin- and end-pictures. 


The techniques we have discussed in this section can be generalized in numerous ways to produce much more 
elaborate warps and morphs. For example: 


1. 


If the pictures are in color, the three components of the picture colors (red, green, and blue) can be 
morphed separately to produce a color morph. 


. Rather than following straight-line paths to their destinations, the vertices of a triangulation can be directed 


separately along more complicated paths to produce a variety of results. 


. Rather than travel with constant speeds along their paths, the vertices of a triangulation can be directed to 


have different speeds at different times. For example, in a morph between two faces, the hairline can be 
made to change first, then the nose, and so forth. 


. Similarly, the gray-level mixing of the begin-picture and end-picture at different times and different 


vertices can be varied in a more complicated way than that in Equation 14. 


. One can morph two surfaces in three-dimensional space (representing two complete heads, for example) 


by triangulating the surfaces and using the techniques in this section. 


6. One can morph two solids in three-dimensional space (for example, two three-dimensional tomographs of 
a beating human heart at two different times) by dividing the two solids into corresponding tetrahedral 
regions. 


7. Two film strips can be morphed frame by frame by different amounts between each pair of frames to 
produce a morphed film strip in which, say, an actor walking along a set is gradually morphed into an ape 
walking along the set. 


8. Instead of using straight lines to triangulate two pictures to be morphed, more complicated curves, such as 
spline curves, can be matched between the two pictures. 


9. Three or more pictures can be morphed together by generalizing the formulas given in this section. 


These and other generalizations have made warping and morphing two of the most active areas in computer 
graphics. 


Exercise Set 10.20 


1. Determine whether the vector v is a convex combination of the vectors ¥1, ¥2, and ¥3. Do this by solving 
Equations | and 3 for ¢1, ¢2, and ¢3 and ascertaining whether these coefficients are nonnegative. 


(a) v=|3} n= n=([5] »- [| 


 v=| 7} m=[t} ve=([5). v3=[3] 
(©) [5h n=|3} =|73| e=(9] 
@ *=[5} v= 3 ¥2= 23} = | 


(a) Yes; v= 1; | 245 f 2y, 


5 5 5 
(b) No; v= ral + ay7 _ 23 
(c) Yes; ¥ = ral + oy7 + Ov3 
(d) Yes; v= cad + ae + v3 


2. Verify Equation 7 for the two triangulations given in Figure 10.20.4. 
Answer: 


#2 = number of triangles = 7, » = number of vertex points — 7, { = number of boundary vertex points 
= 5; Equation 7) is ? = 2(7) —2=—5. 

3. Let an affine transformation be given by a 2 x 2 matrix M and a two-dimensional vector b. Let 
V= C1 V1 + C2V2 + C33, where cy +¢3 +03 = 1; let w= Mv + b; and let w; = Mv; +b for i= 1, 2, 3. 
Show that W= ¢ywWy sb €7W2 + C3W3. (This shows that an affine transformation maps a convex 
combination of vectors to the same convex combination of the images of the vectors.) 


Answer: 


w= My +b= Meyvy + cav2 +0393) + (c] teg+e3)b 
=c;,(Mv, +b) +¢2(Mv2 +b) + ¢3(Mv3 +b) =cypw, + cpw2 + c3Ww3 
4. (a) Exhibit a triangulation of the points in Figure 10.20.4 in which the points ¥3, ¥5, and ¥g form the 
vertices of a single triangle. 


(b) Exhibit a triangulation of the points in Figure 10.20.4 in which the points ¥3, ¥5, and ¥7 do not form 
the vertices of a single triangle. 


Answer: 
y Vo 
(a) ———— i 
V4 
Ve Vz 
y Vo 
OO — 
Ye Vz 


5. Find the 2 s 2 matrix M and two-dimensional vector b that define the affine transformation that maps the 
three vectors V1, V2, and V3 to the three vectors W1, W2, and W3. Do this by setting up a system of six 
linear equations for the four entries of the matrix M and the two entries of the vector b. 


oecfspey fof) bl 


Answer: 


(c) M 1 0 pe 2 
1)’ —3 

(d) i | a 
M=| 2 , b=| 2 

2 0 —1 


6. (a) Let a and b be linearly independent vectors in the plane. Show that if ¢1 and ¢2 are nonnegative 
numbers such that ¢; ++ ¢z = 1, then the vector ja + eb lies on the line segment connecting the tips 
of the vectors a and b. 


(b 


— 


Let a and b be linearly independent vectors in the plane. Show that if ¢; and ¢2 are nonnegative 
numbers such that ¢; ++ ¢3 < 1, then the vector ¢ja + eb lies in the triangle connecting the origin 
and the tips of the vectors a and b. [Hint: First examine the vector c;a ++ cb multiplied by the scale 
factor 1 / (ey + ¢2).] 


(c) Let ¥j, ¥2, and ¥3 be noncollinear points in the plane. Show that if ¢1, ¢2, and ¢3 are nonnegative 
numbers such that cj ++ ¢3 + ¢3 = 1, then the vector ¢1¥1 ++ €2¥2 + €3V3 lies in the triangle 
connecting the tips of the three vectors. [Hint: Let a= ¥1 — V3 and h = v3 — v3, and then use 
Equation | and part (b) of this exercise. ] 


7. (a) What can you say about the coefficients ¢1, ¢2, and ¢3 that determine a convex combination 
V = C1V¥j + €2¥2 + €3V3 if v lies on one of the three vertices of the triangle determined by the three 
vectors ¥j, ¥3, and ¥3? 


(b) What can you say about the coefficients ¢1, ¢2, and ¢3 that determine a convex combination 
V=C {Vj + €2V2 + €3V3 if v lies on one of the three sides of the triangle determined by the three 
vectors ¥j, ¥2, and ¥3? 


(c) What can you say about the coefficients ¢1, ¢2, and ¢3 that determine a convex combination 
V=C,Vq + €2¥32 + C3V3 if v lies in the interior of the triangle determined by the three vectors ¥1, ¥2, 
and ¥3? 


Answer: 


(a) Two of the coefficients are zero. 
(b) At least one of the coefficients is zero. 


(c) None of the coefficients are zero. 


8. (a) The centroid of a triangle lies on the line segment connecting any one of the three vertices of the 
triangle with the midpoint of the opposite side. Its location on this line segment is two-thirds of the 
distance from the vertex. If the three vertices are given by the vectors ¥1, ¥2, and ¥3, write the 
centroid as a convex combination of these three vectors. 


(b) Use your result in part (a) to find the vector defining the centroid of the triangle with the three vertices 


Stahl} 


Answer: 


(a) 31 \ rae { 33 


b) | 8/3 
| ; | 
Section 10.20 Technology Exercises 


The following exercises are designed to be solved using a technology utility. Typically, this will be 
MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra 
software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to 
read the relevant documentation for the particular utility you are using. The goal of these exercises is to 
provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in 
these exercises, you will be able to use your technology utility to solve many of the problems in the regular 
exercise sets. 


Vil 
T1. To warp or morph a surface in 27 we must be able to triangulate the surface. Let vy = | V12 |, 
V13 
vai V31 ¥1 
v2 =| ¥22 |, and v3 = | V32 | be three noncollinear vectors on the surface. Then a vector ¥ = | V2 | lies in the 
V23 ¥33 V3 


triangle formed by these three vectors if and only if v is a convex combination of the three vectors; that is, 
V = CV + C2V2 + C3¥3 for some nonnegative coefficients ¢1, ¢2, and ¢3 whose sum is 1. 


(a) Show that in this case, ¢1, ¢2, and ¢3 are solutions of the following linear system: 


Vil ¥21 V31 C1 a 
Vi2 ¥22 V32 c7|= v2 
V13° -V¥23) «433 C3 v3 
tf 2 | 1 
2 
In parts (b)—(d) determine whether the vector v is a convex combination of the vectors ¥j =| 7 |, 
=—5 
3 2 
v2=/O)|,andv3=] 2). 
9 —4 
ob), 5 
=a 9 
9 
© , E 
.=Z 9 
9 


r= ; —7 
50 
T2. To warp or morph a solid object in 27 we first partition the object into disjoint tetrahedrons. Let 
Vil Vai V31 VAl 
Vy =| ¥12 |, ¥2 =| ¥22 |, ¥3 =| ¥32 |, and ¥4= | V42 | be four noncoplanar vectors. Then a vector 
¥13 ¥23 ¥33 ¥43 


v =| V2 | lies in the solid tetrahedron formed by these four vectors if and only if v is a convex combination of 
v3 

the three vectors; that is, ¥ = c¢,Vj ++ ¢2¥2 + czV¥3 + c4v4 for some nonnegative coefficients ¢1, ¢2, ¢3, and 

¢4 whose sum is one. 


(a) Show that in this case, ¢1, ¢2, ¢3, and ¢4 are solutions of the following linear system: 


Vil V21 V31 V4t |} e1 v1 
V12 ¥22 V32 V421;e2]_ | V2 
¥13 -¥23 ¥33 ¥43{/e3] | v3 
1 1 #1 #1 |e 1 
2 
In parts (b)—(d) determine whether the vector v is a convex combination of the vectors ¥1 = | —6 |, 
1 
=—3 z =1 
v2=|] 4),¥v3=]2),andvg=|] 3). 
2 3 2 
(b) 7 
v=/0 
7 
(c) 1 
v=/1 
2 
(d) 1 
v=/2 
2 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


[ APPENDIX ial f 


How to Read Theorems 


Since many of the most important concepts in linear algebra occur as theorem statements, it is important to be 
familiar with the various ways in which theorems can be structured. This appendix will help you to do that. 


Contrapositive Form of a Theorem 


The simplest theorems are of the form 


if His true, then C is true. (1) 


where H is a statement, called the hypothesis, and C is a statement, called the conclusion. The theorem is true 
if the conclusion is true whenever the hypothesis is true, and the theorem is false if there is some case where 
the hypothesis is true but the conclusion is false. It is common to denote a theorem of form | as 
H=C (2) 
(read, “H implies C”’). As an example, the theorem 
Lf aand » are both positive numbers, then ab is a positive number. (3) 


is of form 2, where 


H =a and } are both positive numbers (4) 


C= abis a positive number (5) 


Sometimes it is desirable to phrase theorems in a negative way. For example, the theorem in 3 can be 
rephrased equivalently as 


Lf ab is not a positive number, then a and b are nat both positive numbers. (6) 


If we write — 4 to mean that 4 is false and ~— ¢ to mean that 5 is false, then the structure of the theorem in 6 
is 


v C= waH 


In general, any theorem of form 2 can be rephrased in form 7, which is called the contrapositive of 2. Ifa 


theorem is true, then so is its contrapositive, and vice versa. 


Converse of a Theorem 


(7) 


The converse of a theorem is the statement that results when the hypothesis and conclusion are interchanged. 


Thus, the converse of the theorem 7 =» (7 is the statement ¢ —s 47. Whereas the contrapositive of a true 


theorem must itself be a true theorem, the converse of a true theorem may or may not be true. For example, 


the converse of 3 is the false statement 
if ab is a positive number, then a and » are both positive numbers. 


but the converse of the true theorem 
if a>b, then 2a > 2b. 
is the true theorem 


lf 2a > 2b, thena > b. 


Equivalent Statements 


If a theorem ¥ =» ¢ and its converse ¢’ =». ¥ are both true, then we say that H and C are equivalent 
statements, which we denote by writing 


Hec 


(read, “H and C are equivalent’”’). There are various ways of phrasing equivalent statements as a single 
theorem. Here are three ways in which 8 and 9 can be combined into a single theorem. 


Form 1 


Tf a> b, then 24 > 2b, and conversely, if 24 ~ 2b, then gq = b. 


(8) 


(9) 


(10) 


Form 2 


a > b ifand only if 2a > 2b. 


Form 3 


The following statements are equivalent. 
(i) @>d 
(ii) 2a > 2b 


Theorems Involving Three or More Statements 


Sometimes two true theorems will give you a third true theorem for free. Specifically, if ¥ =. cis a true 
theorem, and ¢ =» 7) 1s a true theorem, then 47 =» 7) must also be a true theorem. For example, the theorems 


Lf apposite sides of a quadrilateral are parallel, then the quadrilateral is a parailelo gram. 


and 
Opposite sides of a parallelogram have equal lengths. 


imply the third theorem 
lf opposite sides of a quadrilateral are parallel, then they have equal lengths. 


Sometimes three theorems yield equivalent statements for free. For example, if 

H=C, C3), D= (11) 
then we have the implication loop in Figure A.1 from which we can conclude that 

C=>3H, Dat, H=D (12) 
Combining this with 11 we obtain 

Hec, Ce.) Dek (13) 


In summary, if you want to prove the three equivalences in 13, you need only prove the three implications in 
| 


i ——— i To 


Figure A.1 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


[ APPENDIX ial i 


Complex Numbers 


Complex numbers arise naturally in the course of solving polynomial equations. For example, the solutions of 
the quadratic equation ax* + bx 4+ ¢c = 0, which are given by the quadratic formula 


_ «bs yb? —4ac 


- 2a 
are complex numbers if the expression inside the radical is negative. In this appendix we will review some of 
the basic ideas about complex numbers that are used in this text. 


Complex Numbers 


To deal with the problem that the equation x2 — — j has no real solutions, mathematicians of the eighteenth 


i= yt 
j2=(f—1)7= -1 


but which otherwise has the algebraic properties of a real number. An expression of the form 


century invented the “imaginary” number 


which is assumed to have the property 


a-tbt or a+idb 
in which a and 6 are real numbers is called a complex number. Sometimes it will be convenient to use a 
single letter, typically z, to denote a complex number, in which case we write 
z=a+bi or z=a+ib 
The number a is called the real part of z and is denoted by Re{z), and the number 4 is called the imaginary 
part of z and is denoted by Im{z). Thus, 


Re(3 + 2%) = 3, Im(3 + 2) =2 
Re(1 — 57) = 1, Im(1 — 5?) =Im(1+ (—5)i) = —5 
Re(7i) =Re(0 +71) =0, Im(7i) =7 

Re(4) =4, Im(4) =Im(4 + 03) =0 


Two complex numbers are considered equal if and only if their real parts are equal and their imaginary parts 
are equal; that is, 


a+bi=e+di tfandonlyi a=candb=a 


A complex number z — 3; whose real part is zero is said to be pure imaginary. A complex number z = g 
whose imaginary part is zero is a real number, so the real numbers can be viewed as a subset of the complex 


numbers. 


Complex numbers are added, subtracted, and multiplied in accordance with the standard rules of algebra but 
with ;4 = — 1: 


(a+ bi) + (¢ +i) = (a+c)4+ (+2a)i (1) 

(a + b3) = (¢ + di) = (a—c) + (b6=—a)i (2) 

(a + bi) (eo + di) = (ae = bd) + (ad + de)i (3) 
The multiplication formula is obtained by expanding the left side and using the fact that ;2 — — 1. Also note 


that if } — Q, then the multiplication formula simplifies to 
ale + di) =ac + adi (4) 


The set of complex numbers with these operations is commonly denoted by the symbol C and is called the 
complex number system. 


EXAMPLE 1 Multiplying Complex Numbers 


As a practical matter, it is usually more convenient to compute products of complex numbers by 
expansion, rather than substituting in 3. For example, 


(3 — 23) (4 + 5i) = 12 + 153 — 83 — 10)? = (12 + 10) + 79 = 22 + Fi 


The Complex Plane 


A complex number z = g + bj can be associated with the ordered pair (a, ) of real numbers and represented 
geometrically by a point or a vector in the xy-plane (Figure B.1). We call this the complex plane. Points on 
the x-axis have an imaginary part of zero and hence correspond to real numbers, whereas points on the y-axis 
have a real part of zero and correspond to pure imaginary numbers. Accordingly, we call the x-axis the real 
axis and the y-axis the imaginary axis (Figure B.2). 


at bi 


Figure B.1 


Imaginary axis 


(Imaginary b 
part of z) 


Real axis 
a 

(Real part of z) 

Figure B.2 


Complex numbers can be added, subtracted, or multiplied by real numbers geometrically by performing these 
operations on their associated vectors (Figure B.3, for example). In this sense the complex number system C 
is closely related to R2, the main difference being that complex numbers can be multiplied to produce other 


complex numbers, whereas there is no multiplication operation on 22 that produces other vectors in R2 (the 
dot product produces a scalar, not a vector in 2). 


| The sum of two complex | | The difference of two 
numbers | complex numbers 


Figure B.3 


If z= @ + bi is a complex number, then the complex conjugate of z, or more simply, the conjugate of z, is 
denoted by Z (read, “z bar’) and is defined by 


Z7=a—bi (5) 


Numerically, 7 is obtained from z by reversing the sign of the imaginary part, and geometrically it is obtained 
by reflecting the vector for z about the real axis (Figure B.4). 


Figure B.4 


EXAMPLE 2 Some Complex Conjugates 


z=3+4i z=3=4 
z= —2=-5 F= =—24353 
z=! zZ= =i 
z=] z=/ 


Remark The last computation in this example illustrates the fact that a real number is equal to its complex 
conjugate. More generally, z = Z if and only if z is a real number. 


The following computation shows that the product of a complex number z = g + 4; and its conjugate 
z= q@ — dj is a nonnegative real number: 


z= (a+ bi) (a — bi) =a* — abi + bai — b4j4 = a? +b? (6) 


You will recognize that 


(z= ya +o? 


is the length of the vector corresponding to z (Figure B.5); we call this length the modulus (or absolute value 


of z) and denote it by |z|. Thus, 
b|= yazr= yar + a2 (7) 
7 


number is the same as its absolute value as defined in beginning algebra. 


Note that if } = Q, then z = g is a real number and j @|, which tells us that the modulus of a real 


Figure B.5 


EXAMPLE 3 Some Modulus Computations 


z=344i b|=¥3?7442=5 
z=i |= yo?+12=1 


Reciprocals and Division 
If z + Q, then the reciprocal (or multiplicative inverse) of z is denoted by } / z (or z~') and is defined by the 


property 
a 
Zz 


This equation has a unique solution for } / z, which we can obtain by multiplying both sides by Z and using 
the fact that 27 = bP [see 7]. This yields 


: ee (8) 
Zz 2 
Z| 
If zz # 0, then the quotient z, / zz is defined to be the product of 21 and 1 / z3. This yields the formula 
2) _ 22 5, _ 2122 
72° a fal ie 


Observe that the expression on the right side of 9 results if the numerator and denominator of z, / zz are 
multiplied by Z3. As a practical matter, this is often the best way to perform divisions of complex numbers. 


EXAMPLE 4 Division of Complex Numbers 
Let z] = 3 + 43 and zz = 1 — 23. Express z, / zz in the form g + dj. 


Solution We will multiply the numerator and denominator of z; / z3 by Z3. This yields 
Z _ 21272 _ 3443 |. 142 
22 27720 «1-2 «1423 
_ 34+ 65 +45 + 857 
1-477 
—5 + 10; 
5 


—14+2 


The following theorems list some useful properties of the modulus and conjugate operations. 


THEOREM B.1 


The following results hold for any complex numbers z, z,, and 23. 
(a) 71 #22 =21 +22 

(b) 71 =22=21 —22 

(c) 7122 =2122 

(a) 27 '22=2, 172 


(ens 


THEOREM B.2 


The following results hold for any complex numbers z, zj, and Z3. 
(a) FI= Fl 

(b) F122] = 12! 

(c) Fr !22|= Fal? fa! 

(a) Fi +22|= Kil + Fal 


Polar Form of a Complex Number 


If z = @ + bj is a nonzero complex number, and if is an angle from the real axis to the vector z, then, as 
suggested in Figure B.6, the real and imaginary parts of z can be expressed as 


a=|cos@ and b= f|\sin@ (10) 
Thus, the complex number z = g + 4; can be expressed as 
z= [e|(cos @ +i sin d) (11) 
which is called a polar form of z. The angle in this formula is called an argument of z. The argument of z is 
not unique because we can add or subtract any multiple of 2 to it to obtain a different argument of z. 
However, there is only one argument whose radian measure satisfies 


—T<g=7 (12) 


This is called the principal argument of z. 


a=|z] cos a 


Figure B.6 


EXAMPLE 5 Polar Form ofaComplex Number 
Express z= ] = y 3i in polar form using the principal argument. 


Solution The modulus of z is 


b= Vi? + (f= 4 =2 


Thus, it follows from 10 with g = ] andb = — 3 that 
1=2cos@ and — (3 =2sind 


and this implies that 
cosg= 1 and sing= _¥3 
2 2 
The unique angle @ that satisfies these equations and whose radian measure satisfies 12 is 
@= —a/ 3 (Figure B.7). Thus, a polar form of z is 


r= 2(coo(—$) +1 dn(—3))=2(co — sin 


Figure B.7 


Geometric Interpretation of Multiplication and Division of Complex 
Numbers 


We now show how polar forms of complex numbers provide geometric interpretations of multiplication and 
division. Let 
21 = 21 |(cos@ +isingdy) and z2= [z2|(cos 62 +7 sin G2) 
be polar forms of the nonzero complex numbers 21 and Z3. Multiplying, we obtain 
2122 = f1||Za|[ (cos @1cos 62 — sin dj sin G2) +7 (sin G1 cos G2 + cos O15 G2) | 

Now applying the trigonometric identities 

cos(o) + 2) = cos d1cos G2 — sin gj sin $2 

sin(@, + 2) = sin djcos @2 + cos Pisin 2 


yields 
2122 = 1 ||z2|[cos(@1 + G2) +2 sin(@ + 2)] (13) 


which is a polar form of the complex number with modulus |zj||z2| and argument # ++ 63. Thus, we have 
shown that multiplying two complex numbers has the geometric effect of multiplying their moduli and adding 
their arguments (Figure B.8). 


Figure B.8 


Similar kinds of computations show that 


7 - exp leos(on — $2) +7 sin(d, — ¢2)] 


which tells us that dividing complex numbers has the geometric effect of dividing their moduli and subtracting 
their arguments (both in the appropriate order). 


EXAMPLE 6 Multiplying and Dividing in PolarForm << 


Use polar forms of the complex numbers z; = 1 4 y 33 and zz = 3 }- 7 to compute 2122 and 
zy fz. 


Solution Polar forms of these complex numbers are 
Zy= 2(cost +i sing | and z3= 2(cost +i sin= | 
(verify). Thus, it follows from 13 that 


Z1Z2 =4[ cos(F + ‘4 +i sin (3 -b ral =4| cos(F +i sin (>) ] = 43 
and from 14 that 


Bhat [eoe(§—E}+40n(§— £)] =exe(f) +200(8) = +} 


6 


As a check, let us calculate 2122 and z / z3 directly: 

ziza= (1+ $38) (3 +3) = f3 +24 35+ 387 = 43 

zy_ l+y3 14793 Y3-2  Y3-i4 = yi? es EE a ; 1, 
22 34 i 3 3 y3-i 3—;7 


which agrees with the results obtained using polar forms. 


Remark The complex number i has a modulus of | and a principal argument of z / 2. Thus, if z is a complex 
number, then jz has the same modulus as z but its argument is greater by w / 2( = 90°); that is, multiplication 
by i has the geometric effect of rotating the vector z counterclockwise by 90° (Figure B.9). 


Figure B.9 


DeMoivre's Formula 


If 1 is a positive integer, and if z is a nonzero complex number with polar form 
z= |(cos 6+ sin @) 


then raising z to the nth power yields 


ae tt ae ee z= [|"[cos(¢4 +--+ +4)) +il[sn(o+o+--- +¢)] 
n factors nt terms n terms 
which we can write more succinctly as 
va nv ! * ' 
z = | (cos wd +7 sin x) (15) 


In the special case where |z| = 1 this formula simplifies to 
z" = cos nd +i sin nb 


which, using the polar form for z, becomes 
(cos @+i sind)” =cos nd +i sin nd (16) 


This result is called DeMoivre's formula. 


Euler's Formula 


If 0 is a real number, say the radian measure of some angle, then the complex exponential function g*® is 
defined to be 

oF — cos +i sind (17) 
which is sometimes called Euler's formula. One motivation for this formula comes from the Maclaurin series 


in calculus. Readers who have studied infinite series in calculus can deduce 17 by formally substituting ;@ for 
x in the Maclaurin series for g* and writing 


m2 8 ae nb 

@ _1, 9, GA) GO) GO) GO) 2) 

e =1+i0+ Tn | a | | 1 
—. gt 9? 94. 6 66 
=1+0- 7 -ty +a tig et 


2 4 6 3 > 


=cos#+i sin 


where the last step follows from the Maclaurin series for cos # and sin @. 
If z= a@ + bi is any complex number, then the complex exponential e” is defined to be 

oF = gt Ht — 929% — 92 (cos b +i sin b) (18) 
It can be proved that complex exponentials satisfy the standard laws of exponents. Thus, for example, 


2] = 
e7lg22 a g7ltz2 2 71-72, L_,= 


=e 
Zz Zz 
ee F 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


Answer to Exercises 


Exercise Set 1.1 
. (a), (c), and (f) are linear equations; (b), (d) and (e) are not linear equations 
. (a) and (d) are linear systems; (b) and (c) are not linear systems 


1 

3 

5. (a) and (d) are both consistent 

7. (a), (d), and (e) are solutions; (b) and (c) are not solutions 
9. 


y=zet 
wns dofeeded 
x2= Pr 
x3 = s 
x4 = & 
11. a. 2x4 0 
3x, — 4x2 = O 
x2 = 1 
b. 3x1 =— 2x3 = 5 
7x, + x20 + 4x3 = =3 
—2x2 + x3 = 7 
c. 7x1 + 2x2 4 x3 =— 3x4 = 5 
xX, + 2x2 + 4x3 = 1 
d. *1 = a 
x2 = =2 
x3 = 3 
x4 = 
13. ac 6 
3. 8 
9 =3 
b. E -1 3 1 
0 5 -1 1 
‘| 0 2 0-3 1 «0 
-3 -1 1 0 O =1 
6 2-1 2 =-3 6 
a [1000 -1 7] 


True/False 1.1 
(a) True 

(b) False 

(c) True 
(d) True 
(e) False 
(f) False 
(g) True 
(h) False 


Exercise Set 1.2 


1. . Both 


a 

b. Both 
. Both 
. Both 
. Both 
f. Both 


g. Row echelon 


ae 


oO 


3.0 a, a1 = = 37, x= —8, x3=5 
b. %1 = 13-10, x2 = 13t=—5, x3= —£4+2, x4=8 
ce Xp= —Fo+ 2-11, xg=8, x3= —3t—4, xg= —3t4+9, x5=8t 


d. Inconsistent 


5, x1 =3, x9=1, x3=2 


7.x=t-1, y=2s, z=s, w=t 
9, x1 =3, x9=1, x3 =2 


ll. x=f=1, y=2s, z=s, w=t 
13. Has nontrivial solutions 

15. Has nontrivial solutions 

17. %1=0, x2=0, x3=0 


19, X1 = —s, X2= —t—s, x3=45, x4=8 


21. w=t, x= —f, y=t, z=0 
23. 4=—-1, 29=0, 1g=1, Ig=2 


25. Ifg = 4, there are infinitely many solutions; if g = ~— 4, there are no solutions; if g ¢ -| 4, there is exactly one solution. 
27. If g = 3, there are infinitely many solutions; if zg — — 3, there are no solutions; if g + +. 3, there is exactly one solution. 
29.,—2@_ 48 ,_ _ a 2b 

. a oe eo 
oe E | and E | are possible answers. 


35.x= +1, y= +93, z= +92 
37. 4=1, b= —6, c—2, d= 10 


39. The nonhomogeneous system will have exactly one solution. 


True/False 1.2 
(a) True 

(b) False 

(c) False 

(d) True 

(e) True 

(f) False 

(g) True 

(h) False 

(i) False 
Exercise Set 1.3 
i. 


a. Undefined 
pb. 4x2 
c. Undefined 
d. Undefined 
e. 5x5 
f52 
g. Undefined 
h. 2x2 
3 a [ 765 
—2 13 
737 
b.|—-5 4 =1 
oO =-1 =1 
=-1 1 1 
¢ 15 0 
=5 10 
5 5 
d. | -—? =-28 -14 
=—21 -7 =—35 
e. Undefined 
f. | 22 -6 8 
—2 46 
10060 4 
g. | —39 =21 =24 
9 =6 =—15 


—33 =-12 —30 


11. 


13. 


k. 168 
|. Undefined 


a.| 12 =3 
=—4 5 
4 1 
b. Undefined 
ec. |42 108 75 
12 =3 21 
36 «78 «63 
d | 3 45 9 
11 =-11 17 
7 17 13 
e | 3 45 9 
11 =-11 17 
7 17 13 
f. {21 17 
17? 35 
g | 0 =—2 11 
12 1 8 
h. | 12 6 9 
48 =—20 14 
24 8 16 
i. 61 
j. 35 
k. 28 
. 99 
a. [674141] 
b. [63 67 57] 
c. | 41 
21 
67 
d.| 6 
6 
63 
e, [24 56 97] 
£. | 76 
98 
97 
a. | =3 3 -2 12 3 -2 7 76 3 -2 7 
48)/=3)6/+6} 5], |}29)/=—2/6)+5) 5]/+4]4]; |98]/=7/6]+4] 5/4914 
24 0 4 56 0 4 9 97 0 4 9 
b. | 64 6 4 14 6 -2 4 38 6 —2 4 
21)/=6/0]+7/3); |22}= —2)0]+ 1] +7)3 18|=4)0/+3) 1)45)3 
7 7 5 28 7 7 5 74 7 7 5 
a. {2 —3 5/[%1 7 
9 =1 1]/%2)/=] =-1 
1 5 4), %3 0 
b.|]4 0 =—3 1)/"1 1 
5 1 O =—8|/%2]_ |3 
2-5 9 -1|/%3]" /0 
0 3 =1 FI/*4 2 
a. 5x1 + 6x2 =— 7x3 2 
—x, — 2x2 + 3x3 = 90 
4x2 - x3 = 3 


b. 41 + x2 + x3 = 2 


2x1 + 3x2 = 2 
5x, — 3x2 — 6x3 = =-9 
15, —1 
17, @=4, b= -6,¢=-1,d=1 
23. alfa, 0 0 0 0 0 
0 ay O09 080 ODO 0 
0 0 a3 0 0 O 
0 0 0 ay 0 0 
0 0 0 O ass O 
0 0 0 0 agg 
b. [411 @12 413 @14 a15 a16 
0 a72 433 424 425 46 
0 0 a3 a3q 435 436 
0 O O @aq a4s aag 
0 0 0 ODO «ass asg 
0 0 0 0 0 agg 
Cc ait 0 0 0 0 0 
az az7 O 0 ODO OD 
a3, 432 233 9 0 0 
a4, 442 aq3 agg 0 0 
451 452 453 a5q a55 0 
461 462 463 464 aE5 B66 
d- fay ayg2 9 O 0 OD 
421 4227 a3 0 0 0 
0 432 a33 a34 0 DO 
0 O a4 aay ags 0 
0 0 DO 454 a55 a56 
0 0 0 0 ags a6¢ 
x Xp +x 
25, t(a)=| = ’) 


hae 11 £40 
One; namely, A=] 1 —1 0 
0 0 


True/False 1.3 
(a) True 
(b) False 
(c) False 
(d) False 
(e) True 
(f) False 
(g) False 
(h) True 
(i) True 
(j) True 
(ik) True 
() False 
(m) True 
(n) True 
(o) False 


Exercise Set 1.4 


17. 


f. | 39 13 
26 13 
21.00 4 [27 0 0 
0 26 —18 
0 18 26 
b. | 1 
oF 0 0 
0 0.026 0.018 
0 0.018 0.026 
«14 0 0 
0-5 —12 
012 —5 
d. | 1 Oo 0 
0-3 3 
0-3 =—3 
e. | 16 0 0 
0 —14 -—15 
0 15 -—14 
f.|25 0 0 
0 32 —24 
0 24 32 
27.[_1 9 0 
Eat 
a 0 
a2 
0 60 1 
ayn 
+ 
31. p=cate1agc7(8"| a 
33. Bo 
35. ee eens 
2 2 2 
-1 111 
a= oot 9 
1111 
a er) 
37. 111 
22 2 
At=/_11 1 
a ie 2 
10 0 
99 ol dF 
A 95" 02 — oa 
| (er ar aR 
ae 
True/False 1.4 
(a) False 
(b) False 
(c) False 
(d) False 
(e) False 
() True 
(g) True 
(h) True 
(i) False 
(j) True 
(k) False 


Exercise Set 1.5 


1. a. Elementary 


b. Not elementary 
c. Not elementary 


d. Not elementary 


3: 
® Add 3 times row 2 to row 1: ls | 
hi ; -5 00 
Multiply row | by aah 010 
001 
c 100 
Add 5 times row | to row 3:}0 1 0 
501 
d. 0010 
0100 
Swap rows | and 3: 1000 
0001 
5. oa, : 3-6 -—6 —6 
Swaprows Land2:84=| + 9 5 =] 
b. 2-1 0 =-4 =4 
Add —3 times row 2 torow3:#A=| 1 —3 =1 5 3 
-1 9 4 =12 =—10 
c 13 28 
Add 4 times row 3 torow1:ZA=| 2 5 
3 6 
7. a {9 01 
010 
100 
b. {0 0 1 
010 
100 
c. 100 
010 
—2 01 
d.}]1 00 
010 
201 
9.|—-7 4 
2-1 
TES 52" <3: 
7 7 
31 
7 7 
is [ Sat 26 
2 10 5 
=-1 1 1 
ai, 3. @ 
2 10 5 
15. No inverse 
17. pe ee 
2 2 2 
i 2 A 
2 2 2 
td Jd 1 
2 2 2 
19.[ 7 
2 
-1 
0 


afao4 o 
4 3 3 8 
ak 1 _3 
“8 4 2 ? 
1 
0 0 f 0 
a eek wal U1 
40 ~20 ~10 ~5 
23.) 7 5S 35 _1 
12 24 8 4 
5 #3 121 
6 12 4 72 
2. 9... Boh 
12 2 8 4 
ad =i tb 2 
12-24 “8 4 
25; af 1 
EZ? 9 9 
aie 
Ho 9 
a 
0 0% 0 
=8 
00 0g 
b. [4 1 
ilo a 
0 1 0 0 
1 _1 
oo 3 -4 
0001 
27, ¢#0,1 
29. [-3 1]_[1 olf1 17-4 olf 0 
22] {o 2\/o 1]| o 1ff1 4 
31.1 0 -2] [1 0 -2][1 0 o]f1 0 0 
04 3/=|/01 ollo13ilo40 
oo 1] Joo 1floo1lloo1 
33. f_1 1 
4 8|_[ 1 o]}/-3 offi -17)' © 
1 3] |-11 o1ilo tio S 
4 8 
3.[1 0 2] [1 00], 5 oioo 
0 4 -3/=/0 4 offo 1 -3]/0 1 0 
00 iffoo1 
oo 1| Jo o1 


37. Add —] times the first row to the second row. Add —] times the first row to the third row. Add —] times the second row to the first row. Add the second row to 
the third row. 

True/False 1.5 

(a) False 

(b) True 

(c) True 

(d) True 

(e) True 

(f) True 

(g) False 


Exercise Set 1.6 


1, %1=3, x2=-1 
3, xy= 1, x9 =4, x3= -7 
5.x=1, y=5,z=-1 
7, x1 = 2b; — 53, x2= =), + 3b 
BY 22 ,,— 1 

a= 

i. ,, 221 ,,-U 
os aa dae 


i. i a 4. 
AI 45° 72°95 
34 28 
il. ee 
as a 
fi. 219. 13: 
AL= 45° 7245 
i 1 3 
iv. =a —2 
x1 5° x2 5 
13. No conditions on b; and 3 
15, 63 =); — 22 
17. 61 =43 +44, 62 = 2b3 +54 
19. 11 12 —-3 27 26 
X=| —-6 -8 1 —18 —17 
-15 -21 9 —38 —35 
True/False 1.6 
(a) True 
(b) True 
(c) True 
(d) True 
(ce) True 
(f) True 
(g) True 
Exercise Set 1.7 
| 
5 0 
1 
ae 
2,./=-1 0 0 
1 
0 5 0 
0 0 3 
5./6 3 
4 -1 
4 10 
7. | =<15 10 60 (20 =—20 


35. 


. Not symmetric 
. Symmetric 

. Not symmetric 
. Not symmetric 
. Not invertible 
_@=-8 
_x#1, -2,4 


1 0 oO 
0 =1 0 
o oO -1 
a. Yes 
b. No (unless » = 1) 


c. Yes 


a 


. No (unless » = 1) 


10 
. 10 2 -k 1 0 
A= At Ara 
E ‘| 1 k i=" 


11. 


2* 0 0 
0 3" 9g 
0 0 4% 


43. 


True/False 1.7 
(a) True 

(b) False 

(c) False 

(d) True 

(e) True 

(f) False 

(g) False 

(h) True 

(i) True 

(j) False 

(ik) False 

() False 

(m) True 
Exercise Set 1.8 

1. 50 


30 60 


10 50 


40 
3. a. X3—%4= —500, —xy+x4= 100, xy —x2= 300, x2 -—x3= 100 
b. %1= —100+4, xg= —400+4, x3= —500+8, xg=2 


c. For all rates to be nonnegative, we need ¢ = 500 cars per hour, sox; = 400, x2 = 100, x3=0, x4= 500 


= 24, adh 


te aigels=ig= ta, he=enadd 


9. xy =1, x2=5, x3 =3, and x4=4; the balanced equation is CzHg + 503 — 3CO3 + 4H320 
ll. xy =x2=x3=%x4=6; the balanced equation is CH3zCOF + H,0 — CH3zCOOH + HF 
13. p(x) =x7 = 2x +2 
B13 

6° 6" 


a. Using aj = & as a parameter, p(x) = 1+ kx + (1 =x? where —o9 <k< o0- 


15. p(x) =1+ 


b. The graphs fork = 0, 1, 2, and 3 are shown. 


True/False 1.8 
(a) True 
(b) False 
(c) True 
(d) False 
(e) False 


Exercise Set 1.9 


1. 4, [0.50 0.25 
0.25 0.10 


b. | $25, 290 
$ 22, 581 
a. [01 06 04 


0.3 0.2 03 
0.4 0.1 0.2 


b. | $31, 500 
$ 26, 500 
$ 26, 300 


5. | 123.08 
202.56 


True/False 1.9 


(a) False 
(b) True 
(c) False 
(d) True 
(e) True 
Chapter 1 Supplementary Exercises 
1. 3x1 — x2 + x4 = 1 

2x1 + 3x3 + 3x4 = =1 

3. 3,_ 1 GO. __ 1, 5 

x,= me an > x2=- 38 an > XZ=8, X4=8t 
3. 2x, — 4x2 + x3 = 6 

—4x4 + 3x3 = =1 

x2 = x3 = 3 
Qed eee een 


5, fo 3,44, yl 454.3 
x= Ex+ sy, y at bey 


7. x=4, y=2, ges 
9. 4, a#0, b#2 


bp. @#0, b=2 
o @=0, b=2 
d. @=0, b#2 
is |'0-2 
el 
13. 4 -1 3 -1 
x=[ 6 0 i 
b. y_]1 —2 
[54 
c 113) _ 160 
37 37 
er) o0. ode 
37 37 


15, @=1, b= =2, c=3 


Exercise Set 2.1 
i, My, = 29, Cy = 29 
My2=21, Cy2= -21 
My3=27, Cy3=27 
Moa, = =-11, Cz, = 11 
My = 13, Cy = 13 
M73= -—5, C23 =5 
Ma, = -—19, C33 = -—19 
Maj = -—19, C37 = 19 
M33 = 19, C33 = 19 
3. a, M13 =0, Cy3=0 
pb. M3 = — 96, Co3 = 96 
c. My = —48, Co = -—48 


22: 


59; 


a ogee ie 
| 59 59 
9. a? —5a+21 

uu. —65 

13. —123 

15. A=1lor =—3 

17, A=1lor =1 

19, (all parts) — 123 


33. The determinant is sin24 + cos26 = 1- 
35. dg=di +A 


True/False 2.1 
(a) False 

(b) False 

(c) True 

(d) True 

(e) True 

(f) False 

(g) False 

(h) False 

(i) True 
Exercise Set 2.2 
5, -5 

=! 

9.1 

11...5 

13. 33 

15. 6 

17, —2 


19. Exercises 14: 39; Exercise 15: 6; Exercise 16: -i: Exercise 17: —2 


21. —6 
23. 72 


True/False 2.2 
(a) True 

(b) True 

(c) False 

(d) False 

(e) True 

(f) True 
Exercise Set 2.3 
7. Invertible 

9. Invertible 


11. 
13. 
15. 


17. 
19. 


21. 


23. 


25. 


27. 


29. 
31. 
35. 


Sie 


Not invertible 
Invertible 
5+ 417 
Ke 5 
ke —1 
3 =-5 =5 
At=|-3 4 5 
2-2 =3 
1 3 
2 2 / 
=f 3 
A “=|0 1 5 
1 
0 0 5 
-4 3 0-1 
= 2-1 0 90 
a= —7 oO -1 8 
6 0 1-7? 
3 2 1 
soe | die | aan 
30 38 40 
a ie ee TT 
Cramer's rule does not apply. 
y=0 
a, —189 
b. 1 
a 
c 8 
e 
re 
56 
e 7 


True/False 2.3 
(a) False 
(b) False 


(c) 


True 


(d) False 


(e) 
(f) 


True 
True 


(g) True 
(h) True 


j) 
(i) 


True 
True 


(k) True 


1) 


False 


Chapter 2 Supplementary Exercises 


-18 


24 
—10 


329 


. Exercise 3: 24; Exercise 4: 0; Exercise 5: —1Q; Exercise 6: —48 

. The matrices in Exercise 1-3 are invertible, the matrix in Exercise 4 is not. 
. —b? 4. 5b = 21 

, =120 


17. 


19. 


| 
— ive] 
Shy Sle 


23. 


31 72 102 __ 15 
329 329 329 329 


i) 
- 
——oaaoaooa_—__—_— oe 
I 
| tn[ro Ul Uj 
I 
thl|a Unio Unipo 
I _ 
— 
Jus Lrlpo S|- N 


5, oi 3554, ao 4e a 3 
stb ayy 5S 
29. 2 2 2 2 2 2 
(b) _ ot+a*—b _ at+be—c 
cna= 2ac rt 2ab 


Exercise Set 3.1 


1. a 


11. 


13. 


15. 


17. 


> PyP2 = (-1, 3) 
oo 
. PyPo = (—3, 6, 1) 


a. The terminal point is B(2, 3). 


. The initial point is A(—2, —2, — 1). 


a. u=(—1, 2, —4) is one possible answer. 


as 


bg 


-u=(7, —2, —6) is one possible answer. 


_u+w=(1, -4) 


v—3u=(-12, 8) 
2(u— Sw) = (38, 28) 


| 3v = 2(u+ 2w) = (4, 29) 


-3(w—2u+v) = (33, —12) 
(—2u—v) — 5(w + 3w) = (37, 17) 


. (=1,9, =11, 1) 


(22, 53, —19, 14) 
(=-13, 13, —36, —2) 


_ (=90, —114, 60, —36) 


(<9, <5, =5, =3) 


, (27,29, = 27,9) 


a, w-u=(-9,3, —3, —8,5) 
b. 2v-+ 3u= (13, —5, 14, 13, —9) 


mo Bo 


19. 


Y 


_ —w+ 3(v—u) =(-14, —2, 24, 2,7) 


v-w=(-2,1, -4, -2,7) 


b, 6u+ 2v=(—10, 6, —4, 26, 28) 


21. y— 


_ (Qu= Fw) — (8v +0) = (-77, 8, 94, — 25, 23) 


818211 
3° O° 3° 3° 6 


23. a. Not parallel 
b. Parallel 


c. Parallel 


25. @=3, b=-1 


27. 1 = 
29. 1= 
eb 


a. 


2,¢e9= 1, 63=5 
1, cg=1, ¢3= —1, c4=1 


(3 1 -4) 
ae ae 


b. f23 9 1 
4° "474 
True/False 3.1 
(a) False 


(b) False 
(c) False 


(d) True 
(e) True 


(f) False 
(g) False 


(h) True 


(i) False 


(j) True 


(k) False 


Exercise Set 3.2 


I 


b. 


e 


11. 


ia 


ic} 


a. 


= - 


f=] 


+ vl = $83 

| + [lvl = 17 + 26 
—2u + 2v|| = 2y3 
—3u — Sv +wll = 466 


=] 


a, ||3u— Sv +4 wll| = ¥ 2570 


{[3u|] — Sv] + llwl] = 3y'46 — 10721 + y'42 


|| — llullvl| = 2y 966 


k= —2 


_ 
7 


u'v= —8 u-u=26, v-v=24 


u:v=0, u-u=44, v:-v=21 


. |ju—vl]| = 14 
 |ju—vil = 59 
» |ju=vll = 677 


16) 


) 


5(<v-+4u—w) = (125, - 25, - 20,75, —70) 
_ =2(3w- v) + (2u-+w) = (32, — 10, 1,27, — 
on a a a2 = 

1 w— Sut 2u) +¥ & 3, -12, -3, 


—~5 v _f4 _3\) _ vw _f_4 3 
lvl =>, rar (> 3) Iv (-$. 5) 


1 


el 


Vis 


(1, 0, 2, 1, 3) 


13. 


15 
* cos 6= ——-— : pis acute 
ayia 
b. cos = — A g is obtu 
yeas ; 0 is obtuse 
136 
© cos@= — === —— . 9 is obtuse 
y 225y 180 ° 
15. 
. a‘b= a5¥3 
2 
17. a. u+ (w+ w) does not make sense because y - yw is a scalar. 
b. w+ (v ++ w) makes sense. 
c. ||w+ ¥|] does not make sense because the quantity inside the norm is a scalar. 
d. (u+¥) — ||u|| makes sense since the terms are both scalars. 
Y% af4 3 
> “3 
oi (a ee ae 
5f2° 52 
*(_3 4 93 
47 2° 4 
Me ss ca eS A 
¥55 55 55) 55) 55 
aa! Cee 
962 
Db. cosf@= = _—3_ 
y10 
c, cos#=0 
d. cos#=0 
25, a. fus¥|=10, |fullllvl] = 13/17 & 14.866 


b. 


c. 


ju-v|=7, |fulliivl] = y10y14 ~ 11.932 
ju-¥|=5, |hulliivll = (3)(2) =6 


27. Asphere of radius | centered at (xg, yg, zg)- 
True/False 3.2 


(a) True 
(b) True 


(c) False 


(d) True 
(e) True 


(f) False 
(g) False 
(h) False 


(i) True 

(j) True 

Exercis 
1. 


e Set 3.3 
Orthogonal 


a. 
b. Not orthogonal 


c. 


a 


Not orthogonal 


. Not orthogonal 


a. Not an orthogonal set 
b. 

€ 
d. 


Orthogonal set 
Orthogonal set 


Not an orthogonal set 


at ea) 


7. Yes 


39. 
41. 


2 =2(x+ 1) + GY =—3)=—@+2)=0 
, 2z=0 

. Not parallel 

. Parallel 

. Not perpendicular 


a, 2 


13 6 2 80) (584 -11) 
gt 13.77 \ 137" 13 


0 (The planes coincide.) 


(P) cos f= 2 cos y= —o— 
Pd oe Tl 


True/False 3.3 
(a) True 
(b) True 
(c) True 
(d) True 
(e) True 


(f) 


False 


(g) False 


Exercise Set 3.4 


1. 


11. 


13. 


15. 


17. 
19. 


Vector equation: (x, y) = (—4, 1) +£(0, —8); 


parametric equations: x = —4, y=1—8¢ 


- Vector equation: (x, y,z) =£( — 3, 0, 1)5 


parametric equations: x = — 3, y=0, z=£ 


. Point: (3, — 6); parallel vector: (—5, = 1) 
. Point: (4, 6); parallel vector: (—6, — 6) 
- Vector equation: (x, y,z) = (—3, 1,0) +4,(0, —3, 6) + £9( —5, 1, 2); 


parametric equations: x = —3—5t3, y= 1—3t, + £2, z= 6t1 + 2t2 
Vector equation: (x, y,z) = (—1, 1,4) +4,(6, —1, 0) +£9( = 1, 3, 1); 


parametric equations: xy = — ] +4 6f; —£3, y=1—t; + 3f2, z=4 +443 


A possible answer is vector equation: (x, y) = £(3, 2); 


parametric equations: x = 3¢, y=2t 


A possible answer is vector equation: (x, y,z) = £1 (0, 1, 0) + £9(5, 0, 4); 


parametric equations: x +- 5t2, »y =£,2=4t2 
xy= —s—t, x2=8, x3=8 
3 19 8 2 


1 3 
= = = t 
m= or--7s at. x2 art 45+ st, XZ=P, X4=S, X5 


21. a, (1, 0,0) +5(—1, 1, 0) +4(= 1, 0, 1) 
b. aplane in p3 passing through P(1, 0, 0) and parallel to (— 1, 1, 0) and (— 1,0, 1) 


PAN me) 3%, + y+ Zz 
—2x + 3y = 
b. a line through the origin in p3 
3 2 
GQxy= eo 2t y= —4¢ z= 
x af ¥ 5h z 
oe om m= — 25441, x2=s, x3=1 
CS xy=1l— 254 it, xg=s5, x3=14+¢ 
27. x= : - 4s- st x2=8, x3=£, x4= 1; The general solution of the associated homogeneous system is x; = — 4s - 3 


particular solution 


True/False 3.4 
(a) True 
(b) False 
(c) True 
(d) True 
(e) False 
(f) True 


Exercise Set 3.5 


1, xX2=s8, x3=8, x4=0.A 


of the given system is x1 


Il. 4, (32, =6, =4) 
b. (= 14, = 20, = 82) 


ce. (27,40, = 
3. (18, 36, — 18) 
5. (-3,9, -3) 
7. $59 
9. $101 


42) 


19. The vectors do not lie in the same plane. 


21, —92 
23. abc 


27. ry 


29, 2(v xu) 


True/False 3.5 
(a) True 
(b) True 
(c) False 
(d) True 
(e) False 
(f) False 


Chapter 3 Supplementary Exercises 


1. 


19. 


21. 
23. 
25. 
29. 


a, 3v—2u= (13, —3, 10) 
b. \ju + ¥ +wl| = 470 
c. ¥774 


* pro fy = -#(2 -5, -5) 


Qa 


e, ur (vxw) = — 122 

f£ (—5v-+w) x (Cu vw) = (— 3150, — 2430, 1170) 
a, 3¥—2u=(=—5, = 12,20, —2) 

b. |lu-+ v + wll = ¥ 106 

c. ¥2810 


d. projyu = - Be. 1, —6, —6) 
. Not an orthogonal set 


a. A line through the origin, perpendicular to the given vector. 
b. A plane through the origin, perpendicular to the given vector. 
ce. {0} (the origin) 


d. A line through the origin, perpendicular to the plane containing the two noncollinear vectors. 
. True 
_8(-1, = 1,5) 
. i j4 
17 
ll. 


[35 


- Vector equation: (x, y,z) = (—2,1,3)+4,(1, —2, —2)+49(5, —1, —5); 


parametric equations: x = — 2+ t) + 5t2, y= 1—2t; —tg, z=3 — 2t; —5tg 
Vector equation: (x, y) = (0, — 3) +£(8, — 1); 


parametric equations: x = 81, y= —3—£ 

A possible answer is vector equation: (x, y) = (0, —5) + £(1, 3); parametric equations: x =¢, y = —5 + 3t 
3(x +1) + 6(7 —5) + 2(2-—6) =0 

-18(x —9) —5ly — 242-4) =0 

A plane 


Exercise Set 4.1 


1. 


11. 


Tr 


(a) U+tV= (2, 6), 3u= (0, 6) 
(c) Axioms 1-5 


3. The set is a vector space with the given operations. 
5. Not a vector space, Axioms 5 and 6 fail. 

Te 
9 


. The set is a vector space with the given operations. 


Not a vector space. Axiom 8 fails. 


The set is a vector space with the given operations. 


ue/False 4.1 


(a) False 
(b) False 
(c) True 
(d) False 
(e) False 


Exercise Set 4.2 


1 
3 
5. 
7. 
9. 


- @, ©, © 
» (a), (b), @) 
» (a), (©), d) 
- (a), ©), @ 
- @), ©), © 


ul. . The vectors span 


a 
b. The vectors do not span 


c. The vectors do not span 


Qa 


. The vectors span 


13. The polynomials do not span 


15. a. Line; x = = St ¥ 


z=t 


= zt, 
b. Line; x = 2¢, y=t, z=0 

. Origin 

Origin 


. Line; x = —3t, y= —2¢, z=t 


-m © 2 0 


: Plane; x — 3y bz=0 


True/False 4.2 
(a) True 
(b) True 
(c) False 
(d) False 
(e) False 
(f) True 
(g) True 
(h) False 
(i) False 
(j) True 
(k) False 


Exercise Set 4.3 


1. a. Ug is a scalar multiple of uy. 
b. The vectors are linearly dependent by Theorem 4.3.3. 
c. B2 is a scalar multiple of P1. 
d. Bisa scalar multiple of A. 

3. None 

5. a. They do not lie in a plane. 


b. They do lie in a plane. 
Ts 2 3 7 3 7 2 
(b) = 4y,—2y2 wot aye war et Sy 
¥1=7¥2— 7V3 Va=9¥1 + O¥3 V3 gts 


9y,=--1 yE 
r pr ASI 


a. They are linearly independent since v1, v2, and ¥3 do not lie in the same plane when they are placed with their initial points at the origin. 
b. They are not linearly independent since ¥;, v3, and V3 line in the same plane when they are placed with their initial points at the origin. 
21. W(x) = —x sinx —cos x #0 for some x. 
23. a, W(x\=e" #0 

b. W(x) =240 
25. W(x) = 2 sin x #0 for some x. 
True/False 4.3 
(a) False 
(b) True 
(c) False 
(d) True 
(e) True 
(f) False 
(g) True 
(h) False 


Exercise Set 4.4 


a. A basis for 22 has two linearly independent vectors. 
b. A basis for 23 has three linearly independent vectors. 


c. A basis for P3 has three linearly independent vectors. 
d. A basis for 433 has four linearly independent vectors. 


3. (a), (b) 
7 a, GH) s= (3, -7) 


b. = eee 
W)5= Pe a) 
Cc. Ow) = (2 ) 
. W)s= (3, -2,1) 
. ) g= (-2, 0, 1) 
u. A)g=(-1,1, -1,3) 
13, A= A, — 42+ A3— Ag 
15. P= 7p) —8p2 + 3p3 
17. a. (2,0) 


— 


True/False 4.4 
(a) False 
(b) False 
(c) True 
(d) True 
(e) False 
Exercise Set 4.5 
1. Basis: (1, 0, 1); dimension = 1 
3. Basis: (4, 1, 0,0), (=—3, 0, 1,0), (1,0, 0, 1); dimension = 3 
5. No basis; dimension = 0 
a 
“u9) (Bay 
b. (1, 1, 0), (0, 0, 1) 
c. (2, = 1,4) 
- (I, 1, 0), (0, 1, 1) 


a 


a.” 

b. 22+ 1) 
2 

c. 2+ 1) 
2 


13. Any two of (0, 1, 0, 0), (0, 0, 1, 0), and (0, 0, 0, 1) can be used. 
15. v3= (a, 4, c) with 9g —~3b —5c #0 
True/False 4.5 

(a) True 

(b) True 

(c) False 

(d) True 

(e) True 

(f) True 

(g) True 

(h) True 

(i) True 

(j) False 


Exercise Set 4.6 


11. 


* [w] s= ial 


[w] s= 


c. - 
te=| bo | 
2 


4 
(W)s=4, -—3,1), wne-| | 
1 


mle Bly 


0 
(Pp) g= (0,2, -—1), wne-| | 
-1 


_ w= (16, 10, 12) 
q=3 4x? 


aac ae at 
o-[5 | 


os 3 


ic} 


15: (a) 3 5 
-1 -2 
(yf 2 5 
-1 -3 
twa =[_{} tle=[—3| 
© pele,=| 7] tla,=|_4| 
ile a 5 
3 2 5 
1 
-2 -3 -3 
5 1 6 
b. 2 
9 2 
[w]z, =| -9|. [wla,=|] 23 
=-5 2 
6 
19. a. | cos 24 — sin 26 
sin 20 —cos 24 
23. 4 B= {(1, 1,0), (1, 0, 2), (0, 2, 1)} 
bp {f4 1 _2\ fl _1 2 
a= (5. ot 3) (5: aes 
True/False 4.6 
(a) True 
(b) True 
(c) True 
(d) True 
(e) False 
(f) False 


Exercise Set 4.7 
1. ry=(2, —1,0,1), r29=(3,5,7, —1), r3= (1,4, 2,7); 


3. 


“al-OL 


not in the column space of A. 


a Rutge 


Fig 


a 


o 


e 


| 
Hel 
ble 

| 


Sora ae 


d. | 6 it 1 7 1 
5 5 5 5 5 
a 4 _3 4 a5 
5 +s 5 +f 5 s| 5 +é 5 
0 1 0 1 0 
0 0 1 0 1 
 « 1 Hl 
ry =[102], r2=[001], y= 3} 1 
A | 
b. 1 23 
0 1 
r= [1-300], 2=[0100], =|) e2=| 5 
0 0 
c. ry = [1245], r=[01 —30], r3=[001 —3], rg= [0001], 
1 2 4 5 
0 1 3 0 
cy =|0], cg=]0], e3= 1|, c4=] —3 
0 0 0 1 
0 0 0 0 
d. ry =[12 —15], r2=[0143], r3=[001 —7], rg=[0001] 
1 2 =| 5 
Cqy= 0 c= 1 c= 4 ct4= 3 
Toh lop % ipPors— | =7 
0 0 0 1 
th og: 1 2 
ry=[1 0 2]; rep=[0 0 1]; cy =] 0]; cg=]1 
0 0 
b 1 =3 
o| 1 
rn=[1 -3 0 0],m=[0 10 0], =| 55 e2=| 4 
0 0 
ec Mm=[1 2 4 5]; r2=[0 1 —3 0]; rx=[0 01 =—3] 
I 2 4 5 
0 1 3 0 
ra=[0 0 0 1); cy =]0]; co=]0]; cg =] 1], cg=] —3 
0 0 0 1 
[0 0 0 0 
a.m=[1 2 -1 5);m=[0 1 4 3]:r3=[0 01 7): 
[1 2 -1 5 
fof taf} af | 3 
rga=[0 0 0 1], 1=[ of e2=| 9) = if c= = 
[0 0 0 1 
Me 1s, mid eB) OG, =D), (0 0,1, -3) 
b. (1, —1,2,0), (0, 1,0, 0), (0 0,1, -3) 
c. (1, 1,0, 0), (0, 1, 1, 1), (0, 0, 1, 1), (, 0, 0, 1) 


17. a. [3a —5a 
3b —5b 


b. Since A and B are invertible, their null spaces are the origin. The null space of C is the line 3x 4. y = 0. The null space of D is the entire xy-plane. 


| for all real numbers a, b not both 0. 


True/False 4.7 
(a) True 
(b) False 
(c) False 


(d) False 
(e) False 
(f) True 
(g) True 
(h) False 
(i) True 
(j) False 


Exercise Set 4.8 

L. Rank(A) = Rank(47) =2 

3. a 21 

b. 152 

ey 252 

d. 2;3 

C352 

5. 4 Rank=4, nullity =0 
bp. Rank = 3, nullity = 2 

Rank = 3, nullity =0 


be 


a. Yes, 0 
b. No 

. Yes, 2 
. Yes, 7 
e. No 

f. Yes, 4 
g. Yes, 0 


ae 


9, by =r, b3=8, b3=48—3r, bg=2r—s, b5 =8s—Tr 
ll. No 
13. Rank is 2 if » = 2 and s = 1; the rank is never 1. 
17. Pa) 
b. 5 
3 
d. 3 


19. 01 12 
ll al 
True/False 4.8 
(a) False 
(b) True 
(c) False 
(d) False 
(e) True 
(f) False 
(g) False 
(h) False 
(i) True 
(j) False 


Exercise Set 4.9 

La. Domain: p2; codomain: 23 
b. Domain: 23; codomain: p2 
c. Domain: 23; codomain: 2? 


d. Domain: 26; codomain: 21 
3. R4, R3, (= 1,2, 3) 


5. a. Linear; R3 _, R? 
b. Nonlinear; R2 = RP 


c. Linear; p3 _, p3 
d. Nonlinear; p4_, p2 
7. (a) and (c) are matrix transformations; (b), (d), and (e) are not matrix transformations. 


9/3 5 =1 
4-1 1);7(=1,2,4)=(3, —2, =—3) 


3 2-1 
Mooaf o 1 
-1 0 
1 3 
1-1 
bf 72-11 
o1 10 
-10 00 
ce foo0o0 
000 
000 
000 
000 
a foo 01 
10 00 
00 10 
01 00 
10-10 
13. a, T(-1,4)= 6,4) 


b. T(2, 1, = 3) = (0, = 2,0) 


15. a, (2, -5, -3) 


(2,5, 3) 
ce, (= 2, =5, 3) 


> 


17. a, (—2, 1,0) 
b. (2, 0, 3) 
(0, 1, 3) 


b. (0, 1, 292) 


ce. (-1, —2,2) 
(-2, eee a) 
(- 272, 1, 0} 


(1, 2, 2) 


© 


19. 


iJ 


21. 


iJ 


= 


ig 


25. 


wl wloa wl 


I 
so] w]e wilco 


wl) wl wlth 


29. 


iJ 


. Twice the orthogonal projection on the x-axis. 


b. Twice the reflection about the x-axis. 


31. Rotation through the angle 29. 
33. Rotation through the angle 0 and translation by Xg; not a matrix transformation since Xg is nonzero. 
35. Aline in 2”. 


True/False 4.9 
(a) False 
(b) False 
(c) False 
(d) True 


(e) False 


() True 


(g) False 
(h) False 


(i) True 
Exercise Set 4.10 


1 


11. 


13. 


15. 


a. 


= 


5 =1 21 -—8 -3 1 


TpoTg=|10 -—8 4|, T4goTp=|—-5 -15 -8 


45 3 25 44 -11 45 


1 1 3 0 
malt ab e=[2 4] 
3. 3 5 4 
mon =|3 aah ToT=|? 4 
T2(71 (x1, %2)) = Gx + 3x2, 6x1 — 2x9), 


T1(Fo(x1, %2)) = (5x1 +4x2, x1 — 4x2) 


1 0 
0 -1 


a. TyoT,=T20T; 
b. T1072 =T207; 


ic} 


Cc -_ oho oe et Bw 


a T,0T2#T20T, 


. Not one-to-one 
. One-to-one 
. One-to-one 
. One-to-one 
. One-to-one 


. One-to-one 


. One-to-one 
i 22 
One-to-one; ‘| ; ; T10#1, wa) = (Gn = 
3 3 
. Not one-to-one 
" 0 -1 afi 
One-to-one; i , FOr, wa) =(-— wa, 


. Not one-to-one 


a. Reflection about the x-axis 


b. Rotation through the angle — 4 


a 


» Contraction by a factor of 4 


. Reflection about the yz-plane 
. Dilation by a factor of 5 


Cres 
302 3 


—wy) 


Wy + 


1 


3 


17. 


> fj 


ic} 


19. 


. Matrix operator 
. Not a matrix operator 
. Matrix operator 


. Not a matrix operator 


a. Matrix transformation 


b. Matrix transformation 


21. 


23. 


ip 
[aa 
‘ba 


a. Pale) =(=1, 2,4), Tale2) = G, 1,5), Tale3) = (0, 2, = 3) 


b. Fale, + e2 + e3) = (2, 5, 6) 


25. 


. PaCle3) = (0, 14, —21) 


. Yes 


b. Yes 


27. 


29. 


a. 


b. 


(b) T(x4, x2) = (xf + x4. x12] 


The range of T is a proper subset of 2”. 


T must map infinitely many vectors to 0. 


True/False 4.10 


(a) False 
(b) True 
(c) True 
(d) False 
(e) False 
(f) False 


Exercise Set 4.11 


7. Rectangle wit 


I oor Or fO 

r- OO oor 

eo 6. 6 = © = © oo Fe OO rF COCO KF OO 
SS ee eee 


in 


vertices at (0, 0), (—3, 0), (0, 1), (—3, 1) 


iJ 


au 


3] 
0] 


i. a. Expansion by a factor of 3 in the x-direction 
b. Expansion by a factor of 5 in the y-direction and reflection about the x-axis 
c. Shearing by a factor of 4 in the x-direction 
13. rom ip 
5 0 
05 
b. | 1 0 
25 
ce | O =1 
-1 0 
17. aca 2 
b. Y= 
C5 sys 
ane 
d. y= —2x 
e. 8B te 
a 8+ 5y3 |, 
11 
19. (b) No 
= @ (108 
O1k 
001 
b. Shear in the xz-direction with 


factor k maps (x, y, Z) to (x + ky, yz ky)! 


oor 
ao 
—- © ©& 


Shear in the yz-direction with factor k maps (x, y, 2) to (x, y 4+ kx, z-+ kx)! 


True/False 4.11 


(a) False 
(b) True 
(c) True 
(d) True 
(e) False 
(f) False 
(g) True 


Exercise Set 4.12 


iE 


a. 


Stochastic 


b. Not stochastic 


c. 


Stochastic 


d. Not stochastic 


0.45455 


a ee 


Ds 


a. Regular 


b. Not regular 


ce. Regular 


a aro 
ore & 


9.) 4 
11 
ae 
11 
3. 
11 
i. a. Probability that something in state | stays in state 1 
b. Probability that something in state 2 moves to state 1 
c. 0.8 
d. 0.85 
13. a, [0.95 0.55 
0.05 0.45 
b. 0.93 
c. 0.142 
d. 0.63 
15. a 
Year 1 2 3 4 5 
City 95,750} 91,840} 88,243] 84,933] 81,889 
Suburbs] 29,250] 33,160] 36,757} 40,067] 43,111 
b. 
17. «23 
100 
b. | 46 
159 
22 
53 
AT 
159 
ce. 35, 50, 35 
19. @.. de 1 
10 10 5 3 
ab An ge 
P=l5 to 2 P9713 
13 3 1 
10 5 10 3 


21. Pky = q for every positive integer k 


True/False 4.12 
(a) True 
(b) True 
(c) True 
(d) False 
(e) True 
Chapter 4 Supplementary Exercises 

1 () ut+v= 4, 3,2), -u=(—3, 0, 0) 

(c) Axioms 1-5 
3. Ifs#1, —2, the solution space is the origin. If ; — , the solution space is a plane through the origin. If ; — — 2, the solution space is a line through the origin. 


7. A must be invertible 


9. a. Rank = 2, nullity = 1 
pb. Rank = 2, nullity = 2 
c, Rank = 2, nullity =» —2 

” a At x, ae sos a where 23, = » ifn is even and 2, = » — 1 ifn is odd. 
b. 


{x, x2, x3, — xl 


oro ooo 


— Ot OO 


15. Possible ranks are 2, 1, and 0. 
Exercise Set 5.1 
1.5 


3. a. \?-2)-3=0 


. 7 —B\+ 16=0 
ce. \27—12=0 

d. )?74+3=0 

e. \27=0 

f. \7~2\41=0 


> pf 


Basis for eigenspace corresponding to A = 3: 


Basis for eigenspace corresponding to A= 4: 


Basis for eigenspace corresponding to A= V 12: 


d. There are no eigenspaces. 


* Basis for eigenspace corresponding to A = 0: 
* Basis for eigenspace corresponding to A = 1: 


7 a. 1,2,3 
b. —¥2,0, #2 
« 8 
d. 2 
e2 
6 4,3 


a. M4443 — 3427 -\42=0 
b. A483 + 19\? — 2444.48 =0 


MW. a, 2] fo -1 
A= 1:basis ‘ . ° ; A= —2:basis ; 
0 1 0 
b 3 
2 
A=4:basis | 1 
0 
0 
3.) (1P_ 1 o9_ 
i; (3) = sty. 2 =512 
15. a. y=xandy=2x 
b. No lines 
c y=0 


True/False 5.1 
(a) False 
(b) False 
(c) True 


iJ 
| as 0 emer: | 


FS Ml eM) 


; A= —1:basis 


0 000 000 
O|,;09 0 1],}/0 0 0 
0 010 001 
. ‘ : 0 
; basis for eigenspace corresponding to A= = 1: 1 
os ne ie 
12 |; basis for eigenspace corresponding to A= — y12 : 12 
1 1 


—2 
1 
1 
0 


(d) False 
(e) True 
() False 
(g) False 


Exercise Set 5.2 

1. Possible reason: Determinants are different. 
3. Possible reason: Ranks are different. 

5, A=0:1or2; A=1:1; A=2:1, 2, or 3 
7. Not diagonalizable 

9. Not diagonalizable 
11. Not diagonalizable 


13. fa 
roftebew tg 
[11 = 
15. [-2 01 300 
P=| 01 o0|, Pt4P=|0 3 0 
[| 100 002 
1%, [12 1 100 
P=|1 3 3|; P4P=l0 2 0 
|11 34 003 
19. f 100 000 
P=| 01 o0|, Pp 4P=|0 0 0 
[-3 01 001 
21. f1 00 O =2 0 0 0 
{0 11-1] 54,5, | 0 -2 00 
P=loo1 of ? “=| o 030 
[ooo 1 0 003 
23. [—1 10237 —2047 
0 1 0 
0 10245 —2048 
25. 1 121 
1 1 mo off 3 6 
A"=Pp"P1=|2 0 -1]//0 3" 0 $ 0 = 
1-1 1}]}0 o 4” ~ i 4 
3 °3 3 
os cociivieP=|*  o lteeaendy fey Baerethe 00 of Seation’S.1 
Nn possi ility 1s = a-\y a—Ay where 1 an 2 are as 1n Exercise ce) ection 5.1. 


33. a, A=1:dimension = 1, A= 3:dimension <2; A=4:dimension <3 
b. Dimensions will be exactly 1, 2, and 3. 
¢ A=4 


True/False 5.2 

(a) True 

(b) True 

(c) True 

(d) False 

(e) True 

() True 

(g) True 

(h) True 

Exercise Set 5.3 

L. a= (243, -43,1-2), Re (u) = (2,0, 1), m(@) =(-1,4, 1), [lull = f23 
5, x= (7-63, —4— 83, 6 — 127) 

7. a-|,", me Re (=|) | in (4) =| 7! 5 det(A) = 17-3, (A) =1 
i, urv= —1+i, u-w= 18-7), v-w= 124 6 

13, -11-— 143 


i. k= = 3 
b. None 


True/False 5.3 
(a) False 
(b) True 
(c) False 
(d) True 
(e) False 
(f) False 


Exercise Set 5.4 
Ls 


—x 


a. yy =eye™ — 2e7e 


¥2 =cje™* +022 


a y= =e” + ce" 


y2=cje" + 2e9e"* ce" 
y= ene - ce" 
b = 32k 2 3x 
: ype —2e 
yg=er— 20 4. 26% 
y= 2e7* 4. 22% 


7. y=cye™* +7 * 


9. y=cye™ +070" +030" 

True/False 5.4 

(a) False 

(b) False 

(c) True 

(d) True 

(e) False 

Chapter 5 Supplementary Exercises 


1. (b) The transformation rotates vectors through the angle g; therefore, if (() < <q, then no nonzero vector is transformed into a vector in the same or opposite 
direction. 
3. {1 1 0 
021 
00 3 
9. Ae 15 30 ; Be 75 150 ! Ase 375 750 ' pe 1875 3750 
5 10 25 50 125 250 625 1250 
11. 9, tr(A) 


13. They are all 0. 
15. 1 0 0 


1 1 
al 95 
, 1 _1 


17. They are all 0, 1, or —]. 


Exercise Set 6.1 
1. 


Wo 
pb. —6 
. 3 
a. 13 
e. 5 
f. 89 
3. a, 2 
b. 11 
, =13 
a. -8 
e. 0 
5. a =5 
b. 1 
«7? 
d. 1 
e 1 
f. 1 
7, 3 
. 56 
9. (b) 29 
M. «ffs 0 
0 5 
b. [2 0 
0 V6 
3. a. 74 
b. 0 
15. a fi05 
b. 47 
17. {p, q}=50, |Ipll =6y3 
19 a. 32 
b. 35 
ce. 3413 
ae 
b. 


as Foo =| . a pthen(”. ¥\= —2<0,so Axiom 4 fails. 


True/False 6.1 
(a) True 
(b) False 
(c) True 
(d) True 
(e) False 
(f) True 
(g) False 


Exercise Set 6.2 
i. 1 


a —— 


fa 
b. ~ 3. 
ier 
0 
d. _—20_ 
9y'10 
e LL 
iD 
f. —2_ 
/55 
3. q —19 
107 
b. 0 
7. No 
9 a, k= -3 
k= =—2, =3 
13. No 
15. ax=f, y= =—2t, z= — 3 
b. 2x —5y +4z=0 
co x=-z=0 
31. 


a. The line y= —x 
b. The xz-plane 


c. The x-axis 


True/False 6.2 
(a) False 

(b) True 

(c) True 

(d) True 

(e) False 

(f) False 
Exercise Set 6.3 
1. (@), (6), @) 
3. (b), (d) 

5. (a) 


9 


5 


v2 + 4v3 


Ve 


5 


In 


v2 + Ov3 + 


u 
10 


inn 


a= 


11. 


om 
1] 
el (on SA 
| i=] 
3 8 
cola 
1 + 
a 
s «2 


14 

3 
=. 

V6 


a. w= 
b. w 


13. 


as 
Pease | 
mas 


! ! 
Sols 
mr calar 
lA ler 
! ! 


—— 


mn me 
EA z 
oa 
™ owls 
| 1 
a 
| 1 
cayan war 


b. ¥1 = (1,9), v2=(0, = 1) 


= J 
= 
= 
| -| — 
= 1 ols 
a a4 cal 
als a ] 
© P 
I aS |= 
re II — 
all= a ll 
> a 
1 oF 
_ oe ge 
-(< ~ wim 
lS as 
° 
alle 4K os 


o 4 Ive 3y2 
Blo | 

1 1 

yo 

1 _8_ 
234 

2 _u_|? 3 

7 234 ]) , 26 

2 _7_ 3 

3 234 

A ot 4 

fo Ws We |? v2 v2 

cae ee 

1 41 _1ffo o0 + 

2 ys Ye V6 

1 2 3 iy v9 3y2 

y2 2f19 19 2 19 
af2 1 flo of —E 
yi9 19 419 | 


f. Columns not linearly independent 
33. vy =1, vo = f3(2x—1), v3= '5(6x? — 6x + 1) 
True/False 6.3 


(a) False 
(b) False 
(c) True 
(d) True 
(e) False 
(f) True 


Exercise Set 6.4 


1, fe 


b. 


— 


Fis 
~» 


iw 


' xy=5, x2= 


* Solution: x = ( 


21 25|[*1]_ | 20 
E selz2\= [| 
15-1 5/)[*1 -1 
-1 22 30 - 9 
5 30 45 || *3 13 


i 
2 


xy = 12, x9= —3, x3=9 


| rpo polo 


o 

ll 

I 
WOWW Ww 


7 a least squares error: 4/5 
2 
7 


* Solution: x = ( 0) + £(—3, 1) (ta real number); least squares error: 442 


© Solution: x = (-Z. Z 0) +é(—1, —1, 1) (¢areal number); least squares error: 4 294 
9 a. (7,2,9, 5) 
b. (- 12 _4 12 3) 
ee ia ie | 
Me det (A ? 4) =0; A does not have linearly independent column vectors. 
b. det (A ? 4) =0; A does not have linearly independent column vectors. 
13. a 100 
[P]=]0 0 0 
001 
b. 000 
[2] =|0.-1 6 
001 
1S. a, 1,9, —5), (0, 1,3) 
b. 10 15 -—5 
[Pl=ze| 15 26 3 
-5 3 34 
c. f 2x9 +3y9—29 15x94 26y09 +329 —5x9 + 30 + 3420 
7 : 35 . 35 
d. 3y 35 
7 
17, s=t=1 


21. Py] =Al(aal) ‘A 


True/False 6.4 
(a) True 

(b) False 

(c) True 

(d) True 

(e) False 

(f) True 

(g) False 

(h) True 
Exercise Set 6.5 


bye cla 
¥ 3 5x 


3. y=2+45x— 3x? 


True/False 6.5 
(a) False 
(b) True 
(c) False 
(d) True 


Exercise Set 6.6 


Log, (+a) —2 sinx —sin 2x 
b. ee (er sin 2x, sin 3x sin x 
(+n) 2f sin x + 2 4 SOE yy SME | 
3 a 1 1__ 3x 
oo eel” 
b. 13 4 l+e 


& 2: 


S| 


b. po & 
1 2 


9414 8 Lf 2 iF si 
$+ elt (=-1) |sin x 
True/False 6.6 
(a) False 
(b) True 
(c) True 
(d) False 
(e) True 
Chapter 6 Supplementary Exercises 
1a. (0,4, 4, 0) witha #0 


b. 
rs ae oe 
is" ¥5 
3. a. The subspace of all matrices in Af 9 with only zeros on the diagonal. 


b. The subspace of all skew-symmetric matrices in M493. 


Ms 1 1 
+ |—=,0, = 
te ‘a 
9. No 
11. 


5 


(b) @ approaches 5 


17. No 


Exercise Set 7.1 


b @] 4.9 1 
5 25 25 
0 3 3 
3 _12 16 
5 2o/ ‘25 
3 ia 
01 
) f 1 
y2 2 
ae eee 
y2 2 
Mf. 9 
y2 y2 
ate 4 Jl 
ve yo 6 
1 21 4 
v3 y3 ¥3 
(e) 


I 
Ale Ale vAlA wl 


Ale Al wale w]e 


Mle Mle Mle Mle 
AA Ale Ale rl 


7 a (—14+343, 3+ 93} 
b. (3-¥3. 343 + 1) 


a gf 4S S51 
( 373032 5-5 3) 


b. f1 3 ~3_ 
( 3,6, -2 


Oo 1 0 
sn@ 0 cos@ 
b. i 0 0 

A=!/0 cos@ sind 
0 =sinf cosé 


il. a. io 0 =siné 


$e ss p=sy 


13. a? +b? = 4 


os 


a 2 
joo ee 


a. Rotations about the origin, reflections about any line through the origin, and any combination of these 


17 


* The only possibilities are # = 


21. 


b. Rotation about the origin, dilations, contractions, reflections about lines through the origin, and combinations of these 


c. No; dilations and contractions 


True/False 7.1 

(a) False 

(b) False 

(c) False 

(d) False 

(e) True 

(f) True 

(g) True 

(h) True 

Exercise Set 7.2 

1. a. \25\=0: \=0: one-dimensional; , — 5: one-dimensional 

b. \3~97\—54=0: \—6: one-dimensional; , — — 3- two-dimensional 
c. \33)2=0: \—3: one-dimensional; \ — 0: two-dimensional 
d. \F — 122 + 36\—32=0: \=2: two-dimensional; \ — 8: one-dimensional 
e. \49\3 —0: \=0: three-dimensional; \ — 8: one-dimensional 


f. \4 9x3 + 29,2 — 24y + 9=0; \=1-, two-dimensional; \ — 3: two-dimensional 


* | 2. 
7 7 
a WO ptgeal? °] 
3 2 0 10 
(7 7 
5. [4 9 2 
5° 5 25 0 0 
P=! 01 o|; P74P=| 0-3 0 
394 0 0 50 
lo" 5 
Se |e od Wk 
v3 y2 6 
Pe lame ees leva 
y3 y2 ¥6 ae 
a. “iy 
(3 (6 
9 |_4 3 
—3 = 00 
34 0 0 —25 0 0 0 
| 5 5 —ptyp_| 0 25 0 0 
ies esr S| o 0 =-25 0 
5 5 0 0 0 25 
00 24 
15. No 
19. Yes 


True/False 7.2 


(a) True 


(b) True 
(c) False 
(d) True 
(e) True 
(f) False 
(g) True 
Exercise Set 7.3 
seonfe 
O 7] *2 
wal $ SE 
=—3 =—9||*2 
c. 9 3 =4 
x 
[x1x2%3]| 2 7! le] 
x 
4 3 4 |L*3 
2 


2 


11. 


13. 
15. 
17. 


19. 
21. 
23. 
27. 
OL, 


» Ox? 5y? — 6xy 


1 
eee 
y2 


Y1). 7 er 
ie C= 39, +3 


22 1 
x1 3 3 3 Iryy 
x2|= 2 4 2 y2|, Q=y? +43 +793 
x3 12 6 ¥3 
L 3 3 3 
a 2 1 
2 |[x x 
[x ¥] 1 [> ]+t-161[}]+2=0 
= 0 
2 
b. 0 O}[x x 
alk i>] + 07-21} ]-5=° 


a. ellipse 
b. hyperbola 
c. parabola 


d. circle 


Hyperbola: 4 (xr)? — (yr)? = 3; 0 = 36.9° 


a. Positive definite 

b. Negative definite 

c. Indefinite 

d. Positive semidefinite 


e. Negative semidefinite 


Positive definite 
Positive semidefinite 
Indefinite 
k>2 
a. 1 z 1 
” n(x—1) 
a oe Bl 
A= n(a—1) 2 
1 1 


ht —1) ate 1) 


b. Yes 


Hyperbola: 2(yr)? — 3(xr)? =8; Ox —26.6° 


33. A must have a positive eigenvalue of multiplicity 2. 


True/False 7.3 
(a) True 
(b) False 
(c) True 
(d) True 
(e) False 
(f) True 
(g) True 
(h) True 
(i) False 
(j) True 
(k) False 
()) False 


Exercise Set 7.4 

. Maximum: 5 at (1, 0) and ( = 1, 0); minimum: —] at (0, 1) and (0, = 1) 
. Maximum: 7 at (0, 1) and (0, -1); minimum: 3 at (1, 0) and (-1, 0) 

. Maximum: 9 at (1, 0, 0) and (-1, 0, 0); minimum: 3 at (0, 0, 1) and (0, 0, -1) 
- Maximum: z = 4y/2 at (x,y) = (2y2, 2) and (-272, ~ 2}, minimum: z= — 4/9 at (x,y) = (- 2y2, 2) and (2y2, - 


au ww = 


2 


13. Critical points: (—1, 1), relative maximum; (0, 0), saddle point 
15. Critical points: (0, 0), relative minimum; (2, 1) and (—2, 1), saddle points 


17. ea 
Corner points: * = (2 = 
2 y2 


21. g(x) =A 
True/False 7.4 
(a) False 

(b) True 

(c) True 

(d) False 

(e) True 


Exercise Set 7.5 
1. a*=| oe 4 a | 


1+i 3-7 0 
3 1 i 2-33 
A=| -j 3 1 
2+3%i 1 2 
5. a, 413 #231 
b. 422 #472 
9. 3B 4 
* = 5 5 
A =A1= 
4; 3; 
5 5 
Ms -i+y3  1-iy 


iy3 

‘ 2y2 2y2 
7 1 +793 -i- ¥3 

af2 2y2 


13. -l1+i 1-3 


15. Steet <3 
je 3 20 
sa er ee o=|5 
ye 3 
17. [ o 0 1 
2. ott 9 200 
p-| 46 6 D=| 010 
Iti 2 4 005 
je 6 
19. . oO i 2—33 
A=| i 0 1 
=oa5p ay 


21. a. 413% — 231 
b. 411% 41 


29. (c) Band C must commute. 
S40. 

y2 2 

ae os 

v2 y2 


39. Multiplication of x by P corresponds to |Ju||? times the orthogonal projection of x onto #7 = span {u} . If |/u|| = 1, then multiplications of x by ¥ = 7 — Qu" 


corresponds to reflection of x about the hyperplane y“. 


True/False 7.5 
(a) False 
(b) False 
(c) True 
(d) False 
(e) False 


Chapter 7 Supplementary Exercises 


aT3 _4 34 
5 5) | 5 5 
4 3| ~|_4 3 
> 5 a 3 
bf 45 _3) [4 _9 2 
5 5 5 2a. 25 
9 4 2] _| 45 4 3 
23 5 25 5 3 
1223 «616| |_3 _12 16 
25 5 25 5 25 25 
5. 1 1 
2° 


7. positive definite 


9. a. parabola 


b. parabola 


Exercise Set 8.1 
1. Nonlinear 
3. Linear 


5. Linear 


a. Linear 

b. Nonlinear 

9, T(x, x2) = (—4x, + 5x2, x1 —3x2); TS, —3) = (—35, 14) 

ii. Tlx4, x2, %3) = (=x, + 4x2 — 23, 5xy — 5x9 — 243, x1 43x93), T2,4, —1) = (15, —9, = 1) 
13, T(2v1 — 3v2 + 4v3) = (— 10, —7, 6) 

15. (a) 

17. (a) 

19. (a) 

aw (4) 

b. (1,0,0), (0, 1,0), ie. = 1) 


C. x, x2, x 


23. 4 f1] [=1 


b. | —14 

19 

11 
ce. Rank(7) = 2, nullity(7) = 1 
d. Rank(A) = 2, nullity(A) = 1 


* fF 


b. | <1 —4 
-1 2 

1 0 

0 fi 


ec. Rank(T) = nullity(T) = 2 
d. Rank(A) = nullity(A) = 2 


27. a. Kernel: y-axis; range: xz-plane 
b. Kernel: x-axis; range: yz-plane 


c. Kernel: the line through the origin perpendicular to the plane y = x; range: plane y = x 


29. a. Nullity(7) = 2 
b. Nullity(7) =4 
c. Nullity(7) =3 
d. Nullity(7) = 1 
31. a 3 


b. No 
33. A line through the origin, a plane through the origin, the origin only, or all of R? 
35. (h) No 
41. ker(D) consists of all constant polynomials. 
8a TG) =F O@) 
b. TH @)) = FOP) 


True/False 8.1 
(a) True 
(b) False 
(c) True 
(d) False 
(e) True 
(f) True 
(g) False 
(h) False 
(i) False 


Exercise Set 8.2 


ii, 


a. ker(T) = {0}; Tis one-to-one 


is 


an 


* ker(T) = {r( - 2, iy} T is not one-to-one 


. ker(T) = {0}; Tis one-to-one 
. ker(T) = {0}, Tis one-to-one 
. ker(T) = (C1, 1)} ; Tis not one-to-one 


f. ker(7) = {&(0, 1, — 1)}; Tis not one-to-one 


a. Not one-to-one 


b. Not one-to-one 


. One-to-one 


a. ker(T) = {k( = 1, 1)} 


a 


Qa 


d. sd 
Ta + sin(x) +c cos(x)) = + 


. Tis not one-to-one since ker(7) # {0}. 
. Tis one-to-one 
. Tis not one-to-one 
. Tis not one-to-one 
. Tis one-to-one 
a 
la be b 
bd e|l=|5 
al @ =lq 
ce f e 
f 
a a 
[a b b ab é 
Tl? i=l eke = 
(2 e))-|ef 72 a}}-[s 
a ad 


a 
: Tax? + bx? + cx) = A 
c 


c 


13. Tis not one-to-one since, for example, f (x) = - (x— i is in its kernel. 


15. Yes; it is one-to-one 


17. T is not one-to-one since, for example a is in its kernel. 
19. Yes 


True/False 8.2 
(a) False 
(b) True 
(c) False 
(d) True 
(e) False 
(f) False 


Exercise Set 8.3 


1. 


an tT BT 


a> 3 


» (720 T1)(, y) = (2x = 3y, ax + 3y) 

. (120 Ty) (x, y) = (4x — 12y, 3x — 9y) 

» (72071) (, y) = (2x + By, x= 2y) 

» (120 Ty) (x, y) = (0, 2x) 

ated 

. (F307 1) (A) does not exist since Tj (A) is not a2 x 2 matrix. 


8. Ta) = 49 


11. 


a 


. T has no inverse. 


1, +143, —-355 


x 8 8 4 
al ee eee ere 
L [)- grit gta t 4x3 
eae eee eo 
grit gta t 4x3 
é 5 ee ra! 
page le 
ol =| ol, 41541 
T | %2|= px 5x24 5%3 
oles ne re 
Xl + 5%2 = 533 
d. [xy 3x1 + 3x2—%3 
a "- 2x1 — 2xg4+%3 
L*3 —4x1 —5xq+ 2x3 


13. a. ay #0 fori=1, 2,3,...,% 


—1 1 1 1 1 
* FO (x1, X2, 3, 4 Xn) = (ar =—X2, [-%3, --4 5—-Xn 
1 n 


15. 


x 
7. @ (1, -1) 
(@) T(2, 3) =24x 


21. a. 110T2=T20T; 
b. T,0T2#T20T, 
c. Tyo T2=T20T 


True/False 8.3 
(a) True 
(b) False 
(c) False 
(d) True 
(e) False 
() True 


Exercise Set 8. 
Ls: 


4 
0 
0 
0 
1 


a. 


wilco Mle oo 


oe 


1 
24 
0 4 
10x + 16x? 


ap! 
0 
i) 
= 
* trole=|_)} trooie=[5| 


ro =[_3), 70) =[55 | 


Ae)-)2 2] 


i” 


sR pe 


© T(x) = 22, TH w@) = Pe- Ds (0 T) M@) = toe -1) 


co jf 
~88 iS 


Ul. a. 1 3 -1 
[Tep]g=]2), (Twale=| 0}, [Twale=|] 5 
6 -2 4 


= 


T(w,) =16 4 51x +19x2, Tew) = —6—5x +5x2, T(v3) =7 +40x + 15x? 


c T(ao + ayx +427) _ 23949 — ae + 289a2 + 201lag = a + 2472 x+ aime + 1072 x? 


a 


; 7(1 +27) = 22+ 56x + 14x? 


13. a, 0 60 000 
6 0 300 > < 
[F207] ei p= » [Falpiar= [7Tilg"g=|0 -3 
Oo +9 03 0 0 0 
Oo 60 00 3 
b. [72071] e¢2=([7alee"(Tilen, 
1% afoo 0 
00-1 
01 0 
b. [0 0 0 
010 
00 2 
ce |2 1 0 
022 
00 2 
d. 210 4 14 
142?" — 8xe7* — 20x20" since] 0 2 2 6)=] -8 
0 0 2]| —10 —20 
21. a. BI, Bu 
b B! Blu 
True/False 8.4 
(a) False 
(b) False 
(c) True 
(d) False 
(e) True 
Exercise Set 8.5 
1. _ 3) 38 
1 =2 11 11 
[7le=| ] [Tle = 2 & 
11 11 
3. jt. 13— 0 25 
melt | oy | 1 
[Tlz= i. [Tl z= 5 9 
v2 ¥2 y211y2 
5. [1 0 0 100 
[7]p=|0 1 O}, [F]y=]o 1 1 
[0 0 0 000 
7 [ose 
3 9 
[Tlz= 1 af (a=| 
[2 3 
i. a 1 1 
o={[} Lal} 


13. 4 A= =—4, A=3 


b. Basis for eigenspace corresponding to \= —4: —2+4 g, ++ x*; basis for eigenspace corresponding to \ = 3: 5— 2x x2 


21. The choice of an appropriate basis can yield a better understanding of the linear operator. 
True/False 8.5 

(a) False 

(b) True 

(c) True 

(d) True 

(e) True 

(f) False 

(g) True 

(h) False 


Chapter 8 Supplementary Exercises 


1. No. T(xy + x3) = A(x] + x2) + B# (Ax, + B) + (Axg + 8) = T(x) + T(x), and ify ¢ 1, then Tex) =cAx+ B¥c(Ax+ 8) =cT (x). 
5. 


J 


. T(e3) and any two of T(e;), TF{ez), and T(e4) form bases for the range; (— 1, 1, 0, 1) is a basis for the kernel. 
p. Rank = 3, nullity = 1 


7. 4, Rank(7) =2 and nullity(T) =2 


b. Tis not one-to-one. 


ii. Rank = 3, nullity = 1 
13.}/1 000 
0010 
0100 
0001 

15. -40 9 

[T]la=| 10 -2 

01 1 

17. 1 =-1 1 

[Tip=]0 1 0 

1 0 -1 


19.) f@)=2, gx) =1 
(©) f(@) =e", g(x) =e* 


21. (d) The points are on the graph. 
25.10 0 0 «=. 0 
10 0 «s+. 0 
050 0 
0 0 4 0 
00 0 st 


Exercise Set 9.1 

1. %1=2, x2=1 

3, xy =3, x9= 1 

5, x)= 1, x9=1, x3=0 
7,x,= —1, x9=1, x3=0 

9, X= —3, xg9=1, 13 =2, x4=1 
11. 


a 20 o]]1 5 -5 
A=LU=|-21 01], 5 4 
201, 5 4 

b. 10 0/2 0 o]]1 5 -3 
A=LiDU;=| <1 1 o||0 1 0 

iC4lee ud 2 

00 1 


c. 10 0//2 1 -1 
A=1nU2=|-1 1 of]0 0 1 
10 1}//0 0 1 
13. 1 O O}/3 O O}} 1 -—4 2 
A=|0 1 O}/0 2 0 10 
2-2 1|/0 0 1 01 
15.x,-21 ,--4 ,,-2 


True/False 9.1 
(a) False 

(b) False 

(c) True 

(d) True 

(e) True 
Exercise Set 9.2 
1. a. 3 dominant 

b. No dominant eigenvalue 


3 | 0.98058]. [0.98837] [0.98679] [0.98715], 
1) 0.19612 | *2| —0.15206 | *2* | ~o 16201 | “4 | —0.15977 |’ 


dominant eigenvalue: \ = 2 + y10% 5.16228; 


beoy es 1 1 
dominant eigenvector: i - (| ad lo iis 
Ogi ae ir oe ea 30256: sony Rew! \® we 6.60550: 


1 
x4 Fe mew AO we 6.60555; 


dominant eigenvalue: \ = 3 + yi3% 6.60555; 


3 


Perea ‘| ¥26+4¥13 | 7_oarig6 
omuinant eigenvector: 24. is tad 0.88167 
26 + 4y'13 


Ocal BL weal! 2 Leieh 2 
=! _o5} “27 | -og} 4 *| —0.929 


b. AM =28 \@~2976, Aw 2.997 


" F : : : 1 
7 Dominant eigenvalue: }, — 3; dominant eigenvector: 


d. 0.1% 
9. 0.99180 
2.99993; 

: F oon00| 

13. iis 1 
Starting with | 0 |, it takes 8 iterations. 

0 

b. 1 
Starting with ' , it takes 8 iterations. 

0 


Exercise Set 9.3 


1. 1 2 
hg =| 2], ag=] 0 
2 3 
3: 0.39057 0.60971 
hy = | 0.65094 |, ay = 0 
0.65094 0.79262 


5. Sites 1 and 2 (tie); sites 3 and 4 are irrelevant 
7. Site 2, site 3, site 4; sites 1 and 5 are irrelevant 
Exercise Set 9.4 
1. g. pe 0.067 second 

b. 7 66.68 seconds 

c. pz 66, 668 seconds, or about 18.5 hours 
3. 4, 9.52 seconds 

pb. Fe 0.0014 second 

c. Fe 9.52 seconds 


d. 28.6 seconds 


5. a. 6.67 x 10° 8 for forward phase, 10 s for backward phase 
b. 1334 
7. flops 


9. 2n3 =n? flops 


Exercise Set 9.5 


1. 0, 5 
3. f5 
an | ee 
d= y2 2 |} ¥2 all | 
1 Lilo eile 1 
vt 
: an (ie eee a 2. 
[0 Ele of ee 
A 2 [9 2y;_2 1 
{5 5 {5 ¥5 
_ 2 2 
y2 § 2 
4 1 22 3y2 0 y2 y2 
“Jz S =z] 0 Off po 
2 a ya |b? ° v2 v2 
3 2 6 
nd oe or ae 
Bo Fle 
ele ale, ele 10 
A=! “fs y2  ¥6\| ° lls | 
eds 1 LIL? ° 
y3 y2 ¥6 
True/False 9.5 
(a) False 
(b) True 
(c) False 
(d) False 
(e) True 
(f) False 
(g) True 


Exercise Set 9.6 


wwf vo|po 


2 
3 
3 a 
1 Li|l¥3 ° Ir o 
xa vale 1 
1 4 
ys ¥2 
5. 2 
Wa 1] 
ALE Te 
2 7 
3 
Re 1 = 
8 1 
43 faa 
5 v2 


9. 70,100 numbers must be stored; A has 100,000 entries 
True/False 9.6 

(a) True 

(b) True 

(c) False 


Chapter 9 Supplementary Exercises 


1] 2 0)/-3 1 
[2 sll 2] 
) 
0 
2 


3. |2 0 1.2 3 
12 012 
11 001 


a. 1 
\=3 V2 
econ fe 
(2 
be pen [9-710] | [0.7071 
3*!o7041 | 0.7071 
een ee 
3*) 0.9918 
id as a i 
20 
2 2 
01 olfo o os a 
ae ee 
y2 y2 
11 ee 
2 2 
120 6] Ja ia 221 2 
4 -8 10]_|2 “2|f24 0]/3 “3 3 
4 =-8 10 1 _1|[0 12}}2 2 _1 
12 0 6] |2 2 2 
11 
2 2 


Exercise Set 10.1 


1. a. y= 3x=—4 


b. y= =—2x +1 


2 a xt py? 4x — 6y +4 =0 or (x — 2)? + (y— 3)? =9 
b. x4 py? 4 2x —4y — 20 =0 or (x +1)? + (y — 2)? = 25 


3. x? 4 2xy 4+ y? = 2x + y =0 (a parabola) 
4. a x+2y4z=0 
b. =x+y—2z+1=0 
5. alx y z 0 
x1 ¥1 z1 1 
x2 y2 22 1 
x3 3 23: 1 


is 


~x-2y+z=05 =x+y—2z2=0 
8 a xP py? 42? — 2x dy — 22 = —2or (x — 1)? + (y— 2)? 4-1)? =4 
b. x? 4 y? 4.2? — 2x — 2y =3 or (x — 1)? + (y— 1)? +227 =5 


10.)y x? x 1 


IY. xp x 1 

yg x3 x2 1 

y3 - x31 
11. The equation of the line through the three collinear points 
12, 0=0 
13. The equation of the plane through the four coplanar points 


Exercise Set 10.2 


1. xj, =2,49= 2, maximum value of z = 2 

2. No feasible solutions 

3. Unbounded solution 

4. Invest $6000 in bond A and $4000 in bond B; the annual yield is $880. 
Bo 


. 9 cup of milk, 2 ounces of corn flakes; minimum cost = 3 = 18.68 


6. a. x1 =O and xz > 0 are nonbinding; 2x; + 3x2 < 24 is binding 

b. x1 —x2 =v for py = — 3 is binding and for »y = — 6 yields the empty set. 

c. x2 <v for y < 8 is nonbinding and for y = Q yields the empty set. 
7. 550 containers from company A and 300 containers from company B; maximum shipping charges = $2110 
8. 925 containers from company A and no containers from company B; maximum shipping charges = $2312.50 


9. 0.4 pound of ingredient A and 2.4 pounds of ingredient B; minimum cost = 24.88 
Exercise Set 10.3 


1. 700 
2. a. 5 
b. 4 
4 a. Ox, a units; sheep, 0 unit 


b. First kind, x measure; second kind, 5 measure; third kind, x measure 


= me x= ee j= Gj—X1,t=2, 3,” 


b. Exercise 7(b); gold, 305 minae; brass, 94 minae; tin, 143 minae; iron, 54 minae 


6 9 Sxt+y+z—-K = 0 
x+7y+z-K = 0 
x+y+8z—K 0 
x als 14¢ 12t , K =¢ where fis an arbitrary number 


13177 131°? (131 
b. Take ¢ = 131, so thatx = 21, y = 14,2=12.K = 131. 


c. Take ¢ = 262, so that x = 42, y = 28,7 = 24, K = 262. 


a. Legitimate son, sre staters; illegitimate son, 4222 staters 


b. Gold, 305 minae; brass, 94 minae; tin, 143 minae; iron, 54 minae 


©. First person, 45; second person, 374; third person, 225 


Exercise Set 10.4 


a oe S(x) = —.12643(x — 4)? — 20211 (x — 4)? + .92158(x —.4) + 38942 
b. SC5) = .47943; error = 0% 


3. a. The cubic runout spline 
b. S(x) = 3x3 — 2x? 45x41 

4. = .00000042(x + 10)? + 000214(x4+10) + .99815, -10<x<0 
oe 00000024(x)? = = = .0000126(x)? +S «.000088¢x) + 99987, 0<x<10 
x= 

= .00000004(r—10)? — .0000054(x—10)? — o00092(x-10) + .99973, 10<x<20 
00000022(x- 20)? — .0000066(x—20)? — .000212(r—20) + 99823, 20<x<30 
Maximum at (x, S(x)) = (3.93, 1.00004) 

5. 00000009(x +10)? = .0000121¢x+ 10)? + .000282(x+10) + .99815, —10<x<0 
Sis .00000009(x)3 — .0000093¢x)? +  .000070(x) + 99987, 0<x<10 
xyj= 

00000004(x—10)2 — .0000066(x—10)? — .000087(x—10) + 99973, 10<x<20 
.00000004(x—20)3 — .0000053(x—20)? — .000207(x—20) + 99823, 20<x<30 
Maximum at (x, S(x)) = (4.00, 1.00001) 
6 4 3 
; = <x<0. 
Six) = * oe O0<x<05 
4x? = 12x°+9x—1 05<x<1 
b. 2-2x 05<x<1 
sw= [re 1<x<15 
c. The three data points are collinear. 

7. _ 

ea, a M, Yn-1 ay1 + ¥2 
1410 --- 000 0]| 4% Yio- 242 +  ¥3 
0141 -++ 000 0]] 4 |_6] »2 - 43 + ya 
pbb ae aa : 2? : 
0000 +--+ 01 4 1]/ M, > Yn-3  — 2¥n-2 + Ont 
1000 :-:. 001 4 My-1 yn? — Wn + yy 

8. 

(b) = —— 
2100 ooo] a aot 7 
1410 0000]; M2 yt = aya bO¥3 
0141 000 0]| M3 |_ 6 y2 -— wz + ye 

i : ne : 

0000 004 1]/My-4 Yn2  - Wat + On 
0000 011 2]/ a, Jaae. 4 ge cae 


Exercise Set 10.5 
loa. ein a x= Bi Ox ee xO= Fac | yOu eee 


54 546 5454 54546 
b. 5. 
P is regular since all entries of P are positive; q = 
i 


re 7 23 273 
x=] 2], xP =] 52], xO =] 396 


72 
: ‘ 2 Pie oe. 
P is regular, since all entries of P are positive: q = 5D 
21 
72 
By a. [9 
7 
Bu 
1? 
b. | 26 
45 
19 
45 
ce | 3. 
19 
Ae 
19 
12 
19 
4. a 1 J ‘ 
Pp" = ij , #=1,2,.... Thus, no integer power of P has all positive entries. 
1-(2) 1 
(3) 
: 00 0 
se pr, E i as n increases, so p"xO _, H for any x©) as n increases. 
“: The entries of the limiting vector H are not all positive. 
6. did 1 
24 4 3 
ean |e aa en 
a 424 has all positive entries; q 3 
dd 1 
442 3 
7, 10 


13 
8. 541% in region 1, 162% in region 2, and 291% in region 3 


Exercise Set 10.6 


loa foo001 
1011 
1101 
0000 
b.fo1100 
00001 
10010 
00100 
00100 
cfo10100 
100000 
010111 
000001 
000001 
001010 
2. a Es P, 


P, Ps 


P, P, 
Py P, 
c. | | 7 | i 
P 6 Ps Py 
P, 
P, P; 


b. 1—step: Pi P32 
2—step: Py + P4— P2 
Pi 34 P34 Py 
3—step: Py + P23P) P32 
Pi 4 P34 P44 Pp 
Pi > P44 P34 P32 
c. l=step: Py Pq 
2—step: Py - P34 P4 
3—step: Py —- P24 P,P, 
Pia Py P34 Py 


0 


oo oO 
oorofm 
orre OO 
— NO © 
-—- OO & 


0 0 
(c) The? th entry is the number of family members who influence both the ith and jth family members. 
5) a, {P1, Pa, P3} 
b. (P3, Pa, Ps} 
c. (Po, P4, Ps, Pg} and {P4, Ps, Pg} 


a. None 
b. {P3, Pa, Pe} 
7.70011 Power of Py =5 
10 0 O| Power of P27 =3 
0 1 0 1) Power of P3=4 
0 1 0 0} Power of P4=2 


8. First, A; second, B and E (tie); fourth, C; fifth, D 
Exercise Set 10.7 


lg, -5/8 
b. [0 1 0] 


e [1000]? 
S Let A= if: for example. 


a. * * 
oS 0 TI, @ -|‘| oe 


M5 [0D Oy, =| pas 


p =(001], qa 


1 
0 
c. 0 
1 
0 


a. 1 
p= (3a) T= |7) <3 
8 
b 1 
p=[3 3h a =|) y= 
6 
c 1 
p=(10), a°=[)], v=3 
0 
d. 3 
p=(3 3} @=]) dae: 
5 
e. Ad 
p=|3 3} =| > we 13 
1 
5 id 
IS * | 20 __3 
p=|33 mb =|’ v= ~ 30 
20 


c. | 78 
54 
79 


J 


. Use Corollary 10.8.4; all row sums are less than one. 


b. Use Corollary 10.8.5; all column sums are less than one. 
e; 2 1.9 
Use Theorem 10.8.3, withx=|1]>Cx=] 9]. 
1 J 


. EB? has all positive entries. 


. $1256 for the CE, $1448 for the EE, $1556 for the ME 
(b) 342 
503 
Exercise Set 10.9 
1. The second class; $15,000 
2. $223 
3, 1:1.90:3.02:4.24:5.00 
5 
6 


3 
4. Price of tomatoes, $120.00; price of corn, $100.00; price of lettuce, $106.67 
s 
6 


-si(gii+eyi t+ ++ +e)) 
2 1:2:3: + a= 


Exercise Set 10.10 


1 64f0 110 
0 
0 


Oo 


of 
oN 


(b) 


0 .866 1.366 .500 
0 = a _ ~ 
i) 


(0, 0,0), (1, 0,0), (i 1,0), and (5,1, 0} 


(c) (0,0, 0), (1,.6,0), (1, 1.6, 0), (0, 1, 0) 


a. 


R= 


1 90 
0-1 0 
0. 60 


ee 
Hie 10 


a 
2 
0 ° 
0 0 sin 20 


SON 
SON 


cos(—45) 0 sin(—45) 0 
Ma= 0 1 0 , M5=/1 
=sin(—45) 0 cos(—45) 0 


Pr =MsM4Mx(M P+ My) 


00 1 wy @ 2 11 
5 0}, M2=|]0 cos45 —sin45 |, Mz=|0 0 
01 0.0 


3 
M,=| 0 
9 0 sin45 cos45° 


. M3=]0 cos 20° 


0 
=sin 20° 
cos 20° 


cos35 0 sin 35° cos(—45°) —sin(—45') 0 


Ma=| 9 1 0 |, M5=| sin¢—45°) cos(—45) 0} 


—sin 35° 0 cos 35° 0 0 


00 -:-. 0 2 
Mg=|0 0 --- O], My=]0 
11 1 0 


oro 
rt OO 


Pr = M7 (M5 M (MMP + M3) + M6) 


cosG 0 sn@ cosa =—sina 0 
0 1 O |, R2=] sina cosa Of], 
=sing 0 cos@ 0 0 1 


cos# 0 sné cosa sina 0 
0 1 O |, 84=] sina cosa 0], 
=-snf 0 cosé 0 Oo 1 


cos 8 0 —sn@ 
0 1 0 
snG@ 0 cos@ 


1 


ll 
ee 
ooreo 


oroe 
rWwWM GH oo 


oo oe 
oor]d 


Exercise Set 10.11 


1. 


a. 


ee 
Ww 

Oo ff Bl Oo 
oO 


Blo Blo Blu Blo 


Mu 


NH oO MH oO 
2 
| 


v0 


20 


oOo Be 
oOo 


fy 
£2 
£3 
£4 


° 
Oo Blo Blo 


Blo 


caltr cal colin colo 


Mle OM Oo 


16 
i 
16 


d. for ¢, and tz, ~12,9%; for £2 and #4, 5.2% 


Exercise Set 10.12 
' Oxts (3 3 


2. 


22° 22 
a. x6 — (1.40000 
xs’ = (1.41000 
x5’ = (1.40900 
x5” = (1.40910 
x5’ = (1.40909 
x; = (1.40909 


b. Same as part (a) 


= a = (9.55000, 25.65000) 


16 


, 1.20000) 
, 1.23000) 
, 1.22700) 
, 1.22730) 
, 1.22727) 
, 1.22727) 


2 
16 


xy = (59500, — 1.21500) 


xs) = (1.49050, 1.47150) 
x5” = (1.40095, 1.20285) 
x5” = (1.40991, 1.22972) 
x; = (1.40901, 1.22703) 


21 
16 


16 
16 


, tO-t= 


4. xf =(1, 1)x) = (2, 0). x3 = (1, 1) 
7. x7 +%g + x9 = 13.00 
x44+x5+%x%6 = 15.00 
x1 +x2+%x3=8.00 
82843(x6 +.xg) + .58579x9 = 14.79 
1.41421(x3-++.x5+x7) = 14.31 
82843(x2 4x4) + .58579x1 = 3.81 
x3+%§ +%9 = 18.00 
x2+%5+%xg = 12.00 
x, +xq+x7= 6.00 
82843 (x2 + x6) + .58579x3 = 10.51 
1.41421 (x1 +%5 + x9) = 16.13 
B2843(x4-+ xg) + .58579x7 = 7.04 
8. x7 +%g+%9 = 13.00 
x4+x5+%6 = 15.00 
x1 +x2+%3= 8.00 
04289(x3 + x5 +27) +.75000(xg + xg) +.61396x9 = 14.79 
91421 (x3 425-427) +.25000(x2 +24 -+%6 +x) = 14.31 
04289(x3 + x5 +7) +.75000(x2 +24) +.61396x, = 3.81 
x3+%§+%9 = 18.00 
xo+%5+%xg = 12.00 
xy +xq+%x7= 6.00 
04289 (x1 +25 + 2x9) + .75000(x2 + x6) + .61396x3 = 10.51 
91421 (xy + x5 +29) + .25000 (x2 + x4 +46 +22) = 16.13 
04289 (x1 +2%5+2%9) + .75000(x4 + xg) +.61396x7 = 7.04 


Exercise Set 10.13 
13] [0 B 
x y2/1 olfx a |. so 0 25 25 
{[?]\ 222 = = in(4) ftn{ 22) = 1.888... 
7(>]} ah Ml+[F} 1,2, 3,4, where the four values of 7, are Hi 2 | and | $? |: des) =(4) (3) 1.888 


2. gx 47; dx(S) x In(4) /In(1/.47) = 1.8. ... Rotation angles: 9° (upper left); —99° (upper right); 129° (lower left); 19° (lower right); 
(0, 0, 0), (1, 0, 0), (2, 0, 0), (3, 0, 0), (0, 0, 1), (0, 0, 2), C1, 2, 9), (2, 1, 3), (2,0, 1), (2, 0, 2), (2, 2, 0), (0, 3, 3) 


a. (i)s= $ (ii) all rotation angles are 9°; (iii) @ 7(S) = In(7) / In(3) = 1.771. ... This set is a fractal. 

b @s= + (ii) all rotation angles are 130°; (iii) d 6S) = In(3) / In(2) = 1.584 . ... This set is a fractal. 

& is= + (ii) rotation angles: 99° (top); 190° (lower left); 130° (lower right); (iii) @ j7(5') = In(3) / In(2) = 1.584 . ... This set is a fractal. 

ad. @s= 3 (ii) rotation angles: 99° (upper left); 180° (upper right); 130° (lower right) (iii) @ y7(S) = In(3) / In(2) = 1.584 . ... This set is a fractal. 


s=.8509..,0= —2. 69°... 


(0.766, 0.996) rounded to three decimal places 
@ (S) = In(16) /In(4) =2 


in(4) /in(3) = 4818. 


oat her as 


9. dy(S) =1n(8) / In(2) = 3; the cube is not a fractal. 
10. ¢=20;s= +: @ #(S) = In(20) /In(3) = 2.726..; the set is a fractal. 


11. 
Initial set 


ae First iterate 


Second iterate 


Third iterate 
~  Pourth iterate 


d px(S) =In(2) / In(3) = 0.6309... 


12. 8 8 2 8 3 8 4 
Area of Sg = 1; area of Sy = Pin 0.888... ; area of §5 = 5) = 0.790... 5 area of §, = Gy = 0.702... 5 area of Sy = G = 0.624... 


Exercise Set 10.14 
1. 11€250) = 750, 11(25) = 50, (125) = 250, (30) = 60, 11¢10) = 30, (50 


| W 


,11(3750) = 7500, 11(6) = 12, 11(5) = 10 


2. One l-cycle: {(0, 0}} ; one 3-cycle: @ 0], 3, é} (0 2\\; two 4-cycles: {(¢ ‘ i. a E 0}, (3. 2\\ and (0 a ie é} (0 ae i 2\\; 
wo vt (04) (3) 0-8) (4) (4) 8) 4) 6-2} -2) 8) 8) (8) 
(Oo) 03} 62) G-2} a} G-# Goh a} Gh 8) 6-2} Bal} mom 


3. (a) 3,7, 10, 2, 12, 14, 11, 10, 6, 1, 7, 8, 0, 8, 8, 1,9, 10, 4, 14, 3, 2, 5, 7, 12,4, 1,5, 6, 11, 2, 13, 0, 13, 13, 11,9, 5, 14, 4, 3, 7e-- 
(c) 6, 5), (10, 15), (4, 19), (2, 0), (2, 2), (4, 6), (10, 16), (5, 0), (5, 5),--- 


(c) — a eS — ee eee 
The frst five iterates of (T5 0) are (sor gor) (Gor ter} Gor ter} Tar ha ( 7 } 


() The matrices of Anosov automorphisms are F | aad 5 Al 


(c) The transformation affects a rotation of S through 99° in the clockwise direction. 


9 Na (1, 1) (1) C/A (Lb 


[>] F 1 a 
(0. i its el. 1/2) [>] F 3 E +(e] 


(0, ass (1,0) (0.0) (1/2,0)) (1.0) 


, ont | 7] =| 2 fej ont: |? ]=] ° fej ont |?) =| 7! |; ra eal Cake 
n region [Ble 0 ; In region [ele =i > In region [Ble = ; In region [ele =) 


12. fl 3 42 21 34 
(5. 3) ana (3, = form one 2- cycle, and (3. 5) and (3. ) form another 2-cyle 


14. Begin with a 101 x 101 array of white pixels and add the letter ‘A’ in black pixels to it. Apply the mapping to this image, which will scatter the black pixels 
throughout the image. Then superimpose the letter “B’ in black pixels onto this image. Apply the mapping again and then superimpose the letter ‘C’ in black pixels 
onto the resulting image. Repeat this procedure with the letters ‘D’ and ‘E’. The next application of the mapping will return you to the letter ‘A’ with the pixels for 
the letters ‘B’ through ‘E’ scattered in the background. 

Exercise Set 10.15 


1. a. GIYUOKEVBH 


b. SFANEFZWJH 
2 a ,4 [12 7 
a -|5 ia| 


b. Not invertible 


c g-t_| 1 19 
A =|,4 | 


d. Not invertible 
e. Not invertible 


f. ,-1_[15 12 
A= 
Ei | 


3. WE LOVE MATH 

* Deciphering matrix = E a enciphering matrix = H A 
5. THEY SPLIT THE ATOM 

6. I HAVE COME TO BURY CAESAR 


7. a. 010110001 


b. | 11 
114 
Loge 


8. Ais invertible modulo 29 if and only if det(.4) # 0 (mod 29). 


Exercise Set 10.16 
2; 


n+l 
an=4+(5) (a9 —co) an 5 
bn=t ee 
n=> n= n= ) as %—00 
n+l 1 
en=4-(3) (a9 —<) eu 4 
4: 
tant = 3 + ear (2ao — bo — 4e0) 
1 _ n=0,1,2,... 
ban+41 3 Gay" (2ag — 4g — 4eq) 
Con+1 =0 
a2 => + 1 (2ag — 49 —4e) 
™ 12” 6(4)” 
bm= > n=1,2 


ee eee ee, ee ee 
©In= 75 64) (2aq — 4g — 4eq) 


* Eigenvalues: Ay = 1, Az = 4: eigenvectors: e, = ial en= 4 


5. 12 generations; .006% 
6. 1 1 ] 
345 gurl 3-44 95)" + (-34 990-45)" 
1 1 n+l n+l 
3 ger (+95) +95) I 1 
a. 1 n = n 0 
3 gaat 1+ ¥5)" + = f5)] ; 
x= xe) as # 00 
2.14 5)" + 1 = ¥5)"I : 
3 antl Leo 0 
1 1 n+l n+l 1 
3 er +95) +95) 2 
+1 +1 
S45 ger 3-4 + 95)" + (-3 4 50-95)" 
8/1 0 0 0 
0000 
0000 
0001 


Exercise Set 10.17 


b. 2° =|}: 20 =|" ale eA 2° =| 


50 50 88 125 191 
& LO _ 7yO_]857] ,OryO_ | 855 
aust es ie 287 
7; 2375 
8. 1.49611 


Exercise Set 10.18 


Yield = 335% of population; xj = 


Yield = 45.2% of population; xj = ; harvest 57.9% of youngest age class 


a 
ole mle uw 
Reo w]e OL 


2. 1.000 2.090 
845 B45 
824 824 
795 795 
759 755 
_ | .699 _ | .699 1.090 + .418 _ 
=) go5) | go5|" sea 
532 532 
0 418 
0 0 
0 0 
0 0 
4, hy =(R-1)f (apbyhgs + Beat + + + + ayb1b2- + + by-1) 
5. p,— @1tagbi t+ +++ + (as-1b1b2: + + by-2)-1 
apbybos bpp + + bay_ybyh2° + -by-2 
Exercise Set 10.19 
1. x 4 
3 +4 cost + cos 2f-+ ocos 3t 
TFT og pe 1h cos 4p bcos 8p 4-1 cos BF 
3 + = cos pit cos pit 3 cos [qa cos ) 


T?/ 2m, 1 4a 1 oe, 1 Oe 
(sin he + 3 sin He + 3 sin Fe + 5 sin Fr) 


1 _ 1 
5-7 ~On—DQr +1) 


1 10n¢ 1 aunt 
oF Ce SE os. 
P10? P (2n)? Pr 


cos at) 


Exercise Set 10.20 


1. 2 2 


a. Yes; y= al + 5Y2 + 5¥3 


b. No; v= 2y, + 4y2 - iy; 


& Yes; v= ey, + 342 + Ow3 
6 5 


d. Yes; v= ivi + 152+ 15 


2. ~ = number of triangles = 7, » = number of vertex points —7, {= number of boundary vertex points — 5; Equation (7) is ? = 2(7) — 2 —5. 
3. w= Mv +b= Meqvy + c2v2 + ¢3993) + (ce, +02 + ¢3)b 
=c,(Mvy, +b) + ¢2(Mv2 + b) +030 v3 +b) =cywy + caw + c9w3 


4. a] V2 


IX 


V3 


v4 


vw fbf 
vel pL 
ane 
elt 


a. Two of the coefficients are zero. 
b. At least one of the coefficients is zero. 


c. None of the coefficients are zero. 


& aly 41,41 
gl + 3v2+ 3¥3 


I 


= 


Copyright © 2010 John Wiley & Sons, Inc. All rights reserved. 


